Iterative method of at least five cycles for the refolding of proteins

ABSTRACT

A novel, generally applicable method for producing correctly folded proteins from a mixture of misfolded proteins, e.g. bacterial inclusion-body aggregates. A major new aspect of the method is that over-all efficiency is achieved by subjecting proteins to a time-sequence of multiple denaturation-renaturation cycles, resulting in gradual accumulation of the correctly folded protein. The method has proven efficient for a variety of recombinant proteins. Also provided are novel encrypted recognition sites for bovine coagulation factor X a . The encrypted recognition sites described may be activated in vitro by controlled oxidation or by reversible derivatization of cysteine residues and thereby generate new cleavage sites for factor X a . Two new recombinant serine protease exhibiting narrow substrate specificity for factor X a  recognition sites are also provided. They may replace natural coagulation factor X a  for cleavage of chimeric proteins.

This is a divisional of application Ser. No. 08/192,060, filed Feb. 4,1994 (now abandoned).

FIELD OF THE INVENTION

This invention relates to recombinant DNA technology and, in particularto protein engineering technologies for the production of correctlyfolded proteins by expression of genes or gene fragments in a hostorganism, heterologous or homologous, as recombinant protein products,by describing novel general principles and methodology for efficient invitro refolding of misfolded and/or insoluble proteins, includingproteins containing disulphide bonds. This invention further relates tothe refolding of unfolded or misfolded polypeptides of any other origin.The invention also relates to novel designs of encrypted recognitionsites for factor X_(a) cleavage of chimeric proteins, sites that onlybecome recognized after in vitro derivatization. Two analogues of bovinecoagulation factor X_(a), suitable for small-, medium-, or large-scaletechnological applications involving specific cleavage of chimericproteins at sites designed for cleavage by factor X_(a) are provided,too. Finally the invention relates to designs of reversibledisulphide-blocking reagents, useful as auxiliary compounds forrefolding of cysteine-containing proteins, including a general arrayprocedure by which such disulphide exchange reagents can be evaluatedfor suitability for this specific purpose.

GENERAL BACKGROUND OF THE INVENTION

Technologies for the production of virtually any polypeptide byintroduction, by recombinant DNA methods, of a natural or synthetic DNAfragment coding for this particular polypeptide into a suitable hosthave been under intense development over the past fifteen years, and areat present essential tools for biochemical research and for a number ofindustrial processes for production of high-grade protein products forbiomedical or other industrial use.

Four fundamental properties of biological systems render heterologousproduction of proteins possible:

(i) The functional properties of a protein are entirely specified by itsthree-dimensional structure, and, due to the molecular environment inthe structure, manifested by chemical properties exhibited by specificparts of this structure.

(ii) The three-dimensional structure of a protein is, in turn, specifiedby the sequence information represented by the specific sequentialarrangement of amino acid residues in the linear polypeptide chain(s).The structure information embedded in the amino acid sequence of apolypeptide is by itself sufficient, under proper conditions, to directthe folding process, of which the end product is the completely andcorrectly folded protein.

(iii) The linear sequence of amino acid residues in the polypeptidechain is specified by the nucleotide sequence in the coding region ofthe genetic material directing the assembly of the polypeptide chain bythe cellular machinery. The translation table governing translation ofnucleic acid sequence information into amino acid sequence is known andis almost universal among known organisms and hence allows nucleic acidsegments coding for any polypeptide segment to direct assembly ofpolypeptide product across virtually any cross-species barrier.

(iv) Each type of organism relies on its own characteristic array ofgenetic elements present within its own genes to interact with themolecular machinery of the cell, which in response to specificintracellular and extracellular factors regulates the expression of agiven gene in terms of transcription and translation.

In order to exploit the protein synthesis machinery of a host cell ororganism to achieve substantial production of a desired recombinantprotein product, it is therefore necessary to present the DNA-segmentcoding for the desired product to the cell fused to control sequencesrecognized by the genetic control system of the cell.

The immediate fate of a polypeptide expressed in a host is influenced bythe nature of the polypeptide, the nature of the host, and possible hostorganism stress states invoked during production of a given polypeptide.A gene product expressed in a moderate level and similar or identical toa protein normally present in the host cell, will often undergo normalprocessing and accumulation in the appropriate cellular compartment orsecretion, whichever is the natural fate of this endogenous geneproduct. In contrast, a recombinant gene product which is foreign to thecell or is produced at high levels often activates cellular defencemechanisms similar to those activated by heat shock or exposure to toxicamino acid analogues, pathways that have been designed by nature to helpthe cell to get rid of "wrong" polypeptide material by controlledintracellular proteolysis or by segregation of unwanted polypeptidematerial into storage particles ("inclusion bodies"). The recombinantprotein in these storage particles is often deposited in a misfolded andaggregated state, in which case it becomes necessary to dissolve theproduct under denaturing and reducing conditions and then fold therecombinant polypeptide by in vitro methods to obtain a useful proteinproduct.

Expression of eukaryotic genes in eukaryotic cells often allows thedirect isolation of the correctly folded and processed gene product fromcell culture fluids or from cellular material. This approach is oftenused to obtain relatively small amounts of a protein for biochemicalstudies and is presently also exploited industrially for production of anumber of biomedical products. However, eukaryotic expression technologyis expensive in terms of technological complexity, labour- and materialcosts. Moreover, the time scale of the development phase required toestablish an expression system is at least several months, even forlaboratory scale production. The nature and extent of post-translationalmodification of the recombinant product often differs from that of thenatural product because such modifications are under indirect geneticcontrol in the host cell. Sequence signals invoking a post-syntheticmodification are often mutually recognized among eukaryotes, butavailability of the appropriate suite of modification enzymes is givenby the nature and state of the host cell.

A variety of strategies have been developed for expression of geneproducts in prokaryotic hosts, advantageous over eukaryotic hosts interms of capital, labour and material requirements. Strains of theeubacteria Escherichia coli are often preferred as host cells because E.coli is far better characterized genetically than any other organism,also as the molecular level.

Prokaryotic host cells do not possess the enzymatic machinery requiredto carry out post-translational modification, and an eukaryotic geneproduct will therefore necessarily be produced in its unmodified form.Moreover, the product must be synthesized with an N-terminal extension,at least one additional methionine residue arising from the requiredtranslation initiation codon, more often also including an N-terminalsegment corresponding to that of a highly expressed host protein.General methods to remove such N-terminal extensions by sequencespecific proteolysis at linker segments inserted at the junction betweenthe N-terminal extension and the desired polypeptide product have beendescribed (Enterokinase-cleavable linker sequence: EP 035384, TheRegents of the University of California; Factor X_(a) -cleavable linkersequence: EP 161937, Nagai & Th.o slashed.gersen, Assignee: CelltechLtd.).

Over the years a considerable effort has been directed at thedevelopment of strategies for heterlogous expression in prokaryotes togenerate recombinant protein products in a soluble form or fusionprotein constructs that allow secretion from the cell in an active,possibly N-terminally processed form, an effort resulting in limitedsuccess only, despite recent developments in the chaperone field.Typically, much time and effort is required to develop and modify anexpression system before even a small amount of soluble and correctlyfolded fusion protein product can be isolated. More often all of thepolypeptide product is deposited within the host cell in an improperlyfolded state in "inclusion bodies". This is particularly true whenexpression eukaryotic proteins containing disulphide bridges.

Available methods for in vitro refolding of proteins all describeprocesses in which the protein in solution or non-specifically adsorbedto ion exchange resins etc. is exposed to solvent, the composition ofwhich is gradually changed over time from strongly denaturing (andpossibly reducing) to non-denaturing in a single pass. This is oftencarried out by diluting a concentrated solution of protein containing6-8 M guanidine hydrochloride or urea into a substantial volume ofnon-denaturing buffer, or by dialysis of a dilute solution of theprotein in the denaturing buffer against the non-denaturing buffer.Numerous variants of this basic procedure have been described, includingaddition of specific ligands or cofactors of the active protein andincorporation of polymer substances like poly ethylene oxide(polyethylene glycol), thought to stabilize the folded structure.

Although efficient variants of the standard in vitro refolding procedurehave been found for a number of specific protein products, includingproteins containing one or more disulphide bonds, refolding yields aremore often poor, and scale-up is impractical and expensive due to thelow solubility of most incompletely folded proteins which implies theuse of excessive volumes of solvent.

The common characteristic of all traditional in vitro refoldingprotocols is that refolding induced by sudden or gradual reduction ofdenaturant is carried out as a single-pass operation, the yield of whichis then regarded as the best obtainable for the protein in question.

The general field of protein folding has been summarized in a recenttext book edited by Thomas W. Creighton ("Protein folding", ed.Creighton T. E., Freeman 1992) and a more specific review of practicalmethods for protein refolding was published in 1989 by Rainer Jaenicke &Rainer Rudolph (p. 191 223 in, "Protein Structure, a practicalapproach", ed. T. E. Creighton, IRL Press 1989). Among the numerous moredetailed publications, state-of-the-art reviews like those by Schein(Schein C. H., 1990, Bio/Technology 8, 308-317) or Buchner and Rudolph(Buchner J. and Rudolph R., 1991 Bio/Technology 9, 157-162) may beconsulted.

In conclusion, there is a definite need for generally applicablehigh-yield methods for the refolding of un- or misfolded proteinsderived from various sources, such as prokaryotic expression systems orpeptide synthesis.

SUMMARY OF THE INVENTION

It has been found by the inventors that refolding yields can be greatlyincreased by taking into account that the protein folding process is akinetically controlled process and that interconversion between folded,unfolded and misfolded conformers of the protein are subject tohysteresis and lime-dependent phenomena that can be exploited to designa cyclic denaturation-renaturation process, in which refolded proteinproduct accumulates incrementally in each cycle at the expense ofunfolded and misfolded conformers, to generate a new refolding processof much greater potential than the basic traditional approach.

By the term "folded protein" is meant a polypeptide in (a)conformational state(s) corresponding to that or those occurring in theprotein in its biologically active form or unique stable intermediatesthat in subsequent steps may be converted to generate the biologicallyactive species. The covalent structure of the folded protein in terms ofcrosslinking between pairs of cysteine residues in the polypeptide isidentical to that of the protein in its biologically active form.

Accordingly, the term "unfolded protein" refers to a polypeptide inconformational states less compact and well-defined than that or thosecorresponding to the protein in its biologically active, hence folded,form. The covalent structure of the unfolded protein in terms ofcrosslinking between pairs of cysteine residues in the polypeptide mayor may not be identical to that of the protein in its biologicallyactive form. Closely related to an unfolded protein is a "misfoldedprotein" which is a polypeptide in a conformational state which isvirtually thermodynamically stable, sometimes even more so than that orthose states corresponding to the protein in its folded form, but whichdoes not exhibit the same degree, if any, of the biological activity ofthe folded protein. As is the case for the unfolded protein, thecovalent structure in terms of crosslinking between pairs of cysteineresidues in the polypeptide may or may not be the same as that of thefolded protein.

By the term "refolded protein" is meant a polypeptide which has beenconverted from an unfolded state to attain its biologically activeconformation and covalent structure in terms of crosslinking betweencorrect pairs of cysteine residues in the polypeptide.

The new generally applicable protein refolding strategy has beendesigned on the basis of the following general properties of proteinstructure.

(a) The low solubility of unfolded proteins exposed to non-denaturingsolvents reflects a major driving force inducing the polypeptide eitherto form the compact correctly refolded structure or to misfold andgenerate dead-end aggregates or precipitates, which are unable to refoldand generate the correctly refolded structure under non-denaturingconditions within a reasonable amount of time.

(b) A newly formed dead-end aggregate is more easily "denatured" i.e.converted into an unfolded form than the correctly refolded proteinbecause the structure of the dead-end aggregate is more disordered.Probably misfolding is also in general a kinetically controlled process.

(c) An unfolded protein is often not (or only very slowly) able torefold into the correctly refolded form at denaturant levels required todenature dead end aggregates within a reasonable amount of time.

(d) The body of evidence available to support (b) includes detailedstudies of folding and unfolding pathways and intermediates for severalmodel proteins. Also illustrative is the observation made for manydisulphide bonded proteins that the stability of disulphide bondsagainst reduction at limiting concentrations of reducing and denaturingagents is often significantly different for each disulphide bridge of agiven protein, and that the disulphide bridges in the folded protein arein general much less prone to reduction or disulphide exchange than"non-active" disulphide bonds in a denatured protein or proteinaggregate.

The new strategy for a refolding procedure is most easily illustrated byway of the following theoretical example:

Consider a hypothetical protein--stably folded in a non-denaturingbuffer "A" and stably unfolded in the strongly denaturing buffer "B"(being e.g. a buffer containing 6 M guanidine-HCl)--exposed to buffer Aor to buffer B and then subjected to incubation at intermediate levelsof denaturation in mixtures of buffers A and B.

Levels between e.g. 100 to 75% B lead to conversion of both foldedprotein and dead-end aggregated protein to the unfolded form within ashort period of time.

Levels between e.g. 75-50% B lead to conversion of newly formed dead-endaggregate to the unfolded form, whereas almost all refolded proteinremains in a native-like structure, stable at least within a period oftime of hours, from which it may snap back into the refolded form uponremoval of the denaturant.

Levels in excess of 10% B prevent rapid formation of refolded form fromunfolded form.

A solvent composition step from 100% B to 0% B converts unfolded proteinto dead-end aggregate (75% yield) and refolded protein (25% yield).

Let us now subject a sample of this protein, initially in its unfoldedform in 100% B, to a time-series of programmed denaturation-renaturationcycles as illustrated in FIG. 1, each consisting of a renaturation phase(F_(n)) (<10% B) and a denaturation phase (D_(n)). At the end of therenaturation phase of cycle(i) the denaturant content is changed to alevel, k_(i) % less than the denaturant level of the previous cycle.Following a brief incubation the denaturant is again removed, and thenext renaturation phase F_(i+1) entered. Assuming the denaturation levelstarts out at 100% B and k_(i) for each cycle is fixed at 4%, thisrecipe will generate a damped series of "denaturation steps" dying outafter 25 cycles.

Through 25 cycles, as outlined above, the accumulation of refoldedprotein would progress as follows:

In cycles 1 to 5 all of the protein, folded as well as misfolded willbecome unfolded in each of the denaturation phases D_(n).

Cycles 7 through 12: Dead-end aggregates will be converted to unfoldedprotein in each step whereas protein recoverable as refolded productwill accumulate in the following amounts, cycle by cycle: 25%, 44%, 58%,68%, 76% and 82%.

No further conversions take place through cycles 13 to 25.

The cyclic refolding process would therefore produce a total refoldingyield of over 80%, whereas traditional one-pass renaturation at bestwould produce a yield of 25%.

It will be appreciated that a great number of simplifying approximationsin terms of all-or-none graduation at each characteristic of the variousconformational states of the hypothetical protein have been made. Thebasic working principle, nevertheless, remains similar if a morecomplicated set of presumptions are incorporated in the model.

Arranging a practical setup for establishing a cyclicdenaturation/renaturation protein refolding process can be envisaged inmany ways.

The protein in solution could e.g. be held in an ultrafiltration device,held in a dialysis device or be confined to one of the phases of asuitable aqueous two-phase system, all of which might allow theconcentration of low-molecular weight chemical solutes in the proteinsolution to be controlled by suitable devices.

Alternatively, the protein could be adsorbed to a suitable surface incontact with a liquid phase, the chemical composition of which could becontrolled as required. A suitable surface could e.g. be a filtrationdevice, a hollow-fibre device or a beaded chromatographic medium.Adsorption of the protein to the surface could be mediated bynon-specific interactions, e.g. as described in WO 86/05809 (ThomasEdwin Creighton), by folding-compatible covalent bonds between surfaceand protein or via specific designs of affinity handles in a recombinantderivative of the protein exhibiting a specific anddenaturation-resistant affinity for a suitable derivatized surface.

The specific implementation of the cyclic denaturation/renaturationprotein refolding process established to investigate the potential ofthe general method was based on a design of cleavable hybrid proteins(EP 161937, Nagai & Th.o slashed.gersen, Assignee: Celltech Ltd.)containing a metal affinity handle module (EP 0282042 (Heinz Dobeli,Bernhard Eggimann, Reiner Gentz, Erich Hochuli; Hoffman-La Roehe))inserted N-terminally to the designed factor X_(a) cleavage site.Recombinant proteins of this general design, adsorbed onNickel-chelating agarose beads could then be subjected to the presentcyclic refolding process in a chromatographic column "refolding reactor"perfused with a mixture of suitable denaturing and non-denaturingbuffers, delivered by an array of calibrated pumps, the flow rates ofwhich was time-programmed through computer control.

A general scheme of solid-stage refolding entails cycling theimmobilized protein as outlined above or by any other means andimplementations between denaturing and non-denaturing conditions in aprogressive manner, in which the concentration of the denaturing agentis gradually reduced from high starting values towards zero over a trainof many renaturation-denaturation cycles. Using this approach it is notnecessary to determine precisely which limiting denaturant concentrationis required to obtain folding yield enrichment in the course of cyclingof the specific protein at hand, because the progressive train of cycleswill go through (up to) three phases, an early phase in which foldedproduct present at the end of cycle (i) is completely denatured at thedenaturation step of cycle (i+1), an intermediate productive phaseduring which refolded protein accumulates in increasing quantity, and alate phase during which the concentration of denaturant is too low toperturb the refolded protein or any remaining misfolded structures.Subjecting the protein to a progressing series ofdenaturation-renaturation cycles as outlined will therefore includeseveral productive cycles.

For disulphide-containing proteins progressive denaturation-renaturationcycling may be enhanced by using equipment similar to advancedchromatography equipment with on-line facilities to monitor buffercompositions of folding reactor effluent. Information on effluentcomposition with regard to reductant and disulphide reshuffling reagentconcentration profile would reveal productive cycling, and couldtherefore be used as input to an intelligent processor unit, in turnregulating the progression of denaturant concentration in a feed-backloop to ensure that most of the cycling effort is spent within theproductive phase of the denaturation-renaturation cycle train. Suchauto-optimization of cycling conditions would be possible because theanalytical system may be used to measure extent and direction of changesin redox equilibrium in the buffer stream, measurements that directlyreflect titration of thiol-groups/disulphide equivalents in theimmobilized protein sample, and is therefore directly translatable intoaverage number of disulphide bonds being disrupted or formed during thevarious phases of a cycle.

Other possible inputs for the intelligent processor controlling theprogression of cycling include measurements of ligand-binding, substrateconversion, antibody binding ability and, indeed, any other interactingsoluble agent interacting in distinct ways with misfolded and foldedprotein, which in the assessing stage of folding measurement might bepercolated through the refolding reactor and then in-line monitored inthe effluent by suitable analytical devices.

An intelligent monitoring and control system could furthermore use theavailable information to direct usable portions of reactor effluent tosalvage/recycling subsystems thereby minimizing expenses for large scaleoperations.

After execution of the folding procedure the final product may be elutedfrom the affinity matrix in a concentrated form, processed to liberatethe mature authentic protein by cleavage at the designed proteasecleavage site and then subjected to final work-up using standard proteinpurification and handling techniques, well-known within the field ofprotein chemistry.

DETAILED DISCLOSURE OF THE INVENTION

Thus, the present invention relates to a method for generating aprocessed ensemble of polypeptide molecules, in which processed ensemblethe conformational states represented contain a substantial fraction ofpolypeptide molecules in one particular uniform conformation, from aninitial ensemble of polypeptide molecules which have the same amino acidsequence as the processed ensemble of polypeptide molecules, comprisingsubjecting the initial ensemble of polypeptide molecules to a series ofat least two successive cycles each of which comprises a sequence of

1) at least one denaturing step involving conditions exerting adenaturing influence on the polypeptide molecules of the ensemblefollowed by

2) at least one renaturing step involving conditions having a renaturinginfluence on the polypeptide molecules having conformations resultingfrom the preceding step.

In the present specification and claims, the term "ensemble" is used inthe meaning it has acquired in the art, that is, it designates acollection of molecules having essential common features. Initially ("aninitial ensemble"), they have at least their amino acid sequence incommon (and of course retain this common feature). When the ensemble ofpolypeptide molecules has been treated in the method of the invention(to result in "a processed ensemble"), the conformational statesrepresented in the ensemble will contain a substantial fraction ofpolypeptide molecules with one particular conformation. As will beunderstood from the discussion which follows, the substantial fractionof polypeptide molecules with one particular conformation in theprocessed ensemble may vary dependent on the parameters of the treatmentby the method of the invention, the size of the protein in theparticular conformation, the length and identity of the amino acidsequence of the molecules, etc. In the examples reported herein, inwhich the process parameters have not yet been optimized, the fractionof polypeptide molecules with one particular conformation varied between15% and 100% of the ensemble, which in all cases is above what could beobtained prior to the present invention. In example 13 it is furtherdemonstrated that purification of the polypeptide molecules prior totheir subjection to the method of the invention increases the fractionof polypeptide molecules with one particular conformation.

"Denaturing step" refers to exposure of an ensemble of polypeptidemolecules during a time interval to physical and/or chemicalcircumstances which subject the ensemble of polypeptide molecules toconditions characterized by more severe denaturing power than thosecharacterizing conditions immediately prior to the denaturing step.

Accordingly, the term "renaturing step" refers to exposure of anensemble of polypeptide molecules during a time interval to physicaland/or chemical circumstances which subject the ensemble of polypeptidemolecules to conditions characterized by less severe denaturing powerthan those characterizing conditions immediately prior to the denaturingstep.

It will be understood, that the "substantial fraction" mentioned abovewill depend in magnitude on the ensemble of polypeptide molecules whichare subjected to the method of the invention. If the processed ensembleof polypeptides consists of monomeric proteins of relatively shortlengths and without intramolecular disulphide bridges the method will ingeneral result in very high yields, whereas complicated molecules (suchas polymeric proteins with a complicated disulphide bridging topology)may result in lower yields, even if the conditions of the method of theinvention are fully optimized.

An interesting aspect of the invention relates to a method describedabove wherein the processed ensemble comprises a substantial fraction ofpolypeptide molecules in one conformational state the substantialfraction constituting at least 1% (w/w) of the initial ensemble ofpolypeptide molecules. Higher yields are preferred, such as at least 5%,at least 10%, at least 20%, and at least 25% of the initial ensemble ofpolypeptide molecules. More preferred are yields of at least 30%, suchas at least 40%, 50%, 60%, 70%, and at least 80%. Especially preferredare yields of at least 85%, such as 90%, 95%, 97%, and even at 99%.Sometimes yields close to 100% are observed.

When the polypeptide molecules of the ensemble contain cysteine, theprocessed ensemble will comprise a substantial fraction of polypeptidemolecules in one particular uniform conformation which in addition havesubstantially identical disulphide bridging topology.

In most cases, the polypeptide molecules subjected to the method of theinvention will be molecules which have an amino acid sequence identicalto that of an authentic polypeptide, or molecules which comprise anamino acid sequence corresponding to that of an authentic polypeptidejoined to one or two additional polypeptide segments.

By the term "authentic protein or polypeptide" is meant a polypeptidewith primary structure, including N- and C-terminal structures,identical to that of the corresponding natural protein. The term alsodenotes a polypeptide which has a known primary structure which is notnecessarily identical to that of a natural protein, which polypeptide isthe intentional end-product of a protein synthesis.

By the term "natural protein" is meant a protein as isolated inbiologically active form from an organism, in which it is present not asa consequence of genetic manipulation.

In contrast, the term "artificial protein or polypeptide" as used in thepresent specification and claims is intended to relate to aprotein/polypeptide which is not available from any natural sources,i.e. it cannot be isolated and purified from any natural source. Anartificial protein/polypeptide is thus the result of human intervention,and may for instance be a product of recombinant DNA manipulation or aform of an in vitro peptide synthesis. According to the abovedefinitions such an artificial protein may be an authentic protein, butnot a natural protein.

Thus, the invention also relates to a method wherein natural proteins aswell as artificial proteins are subjected to the refolding processesdescribed herein.

As will be explained in greater detail below, it may be advantageous forvarious reasons that the authentic polypeptide is joined to polypeptidesegments having auxiliary functions during the cycling and otherprevious or subsequent processing, e.g. as "handles" for binding thepolypeptide to a carrier, as solubility modifiers, as expressionboosters which have exerted their beneficial function during translationof messenger RNA, etc. Such an auxiliary polypeptide segment willpreferably be linked to the authentic polypeptide via a cleavablejunction, and where two such auxiliary polypeptide segments are linkedto the authentic polypeptide, this may be via similar cleavablejunctions which will normally be cleaved simultaneously, or throughdissimilar cleavable junctions which may be cleaved in any timesequence.

In accordance with what is explained above, it is believed to be a majornovel characteristic feature of the present invention that the cycling(which, as explained above, comprises at least two successive cycles)will give rise to at least one event where a renaturing step issucceeded by a denaturing step where at least a substantial fraction ofthe refolded polypeptides will be denatured again.

In most cases, the processing will comprise at least 3 cycles, often atleast 5 cycles and more often at least 8 cycles, such as at least 10cycles and, in some cases at least 25 cycles. On the other hand, theseries of cycles will normally not exceed 2000 cycles and will oftencomprise at most 1000 cycles and more often at most 500 cycles. Thenumber of cycles used will depend partly on the possibilities madeavailable by the equipment in which the cycling is performed.

Thus, if the cycling treatment is performed with the polypeptidemolecules immobilized to a carrier column, such as will be explained ingreater detail below, the rate with which the liquid phase in contactwith the column can be exchanged will constitute one limit to what canrealistically be achieved. On the other hand, high performance liquidchromatography (HPLC) equipment will permit very fast exchange of theliquid environment and thus make cycle numbers in the range of hundredsor thousands realistic.

Other consideration determining the desirable number of cycles are e.g.,inherent kinetic parameters such as interconversion between cis andtrans isomers at proline residues which will tend to complicateredistribution over the partially folded states and will thus normallyrequire due consideration of timing. Another time-criticalcharacteristic resides in the kinetics of disulphide reshuffling (cf.the discussion below of disulphide-reshuffling systems).

With due consideration of the above, the cycling series will oftencomprise at most 200 cycles, more often at most 100 cycles and yet moreoften at most 50 cycles.

In accordance with what is stated above, the duration of each denaturingstep may be a duration which, under the particular conditions inquestion, is at least one milli-second and at most one hour, and theduration of each renaturing step may be a duration which, under theparticular conditions in question, is at least 1 second and at most 12hours.

In most embodiments of the method, the denaturing conditions of eachindividual denaturing step are kept substantially constant for a periodof time, and the renaturing conditions of each individual renaturingstep are kept substantially constant for a period of time, the period oftime during which conditions are kept substantially constant beingseparated by transition period during which the conditions are changed.The transition period between steps for which conditions are keptsubstantially constant may have a duration varying over a broad range,such as between 0.1 second and 12 hours and will normally be closelyadapted to the durations of the denaturing and renaturing steps proper.

Bearing this in mind, the period of time for which the denaturingconditions of a denaturing step are kept substantially constant may,e.g. have a duration of at least one millisecond and at most one hour,often at most 30 minutes, and the period of time for which therenaturing conditions of a renaturing step are kept substantiallyconstant has a duration of at least 1 second and at most 12 hours, andoften at most 2 hours.

In practice, the period of time for which the denaturing conditions of adenaturing step are kept substantially constant will often have aduration of between 1 and 10 minutes, and the period of time for whichthe renaturing conditions of a renaturing step are kept substantiallyconstant will often have a duration of between 1 and 45 minutes.

It will be understood from the above, that adjustments should be made tothe intervals stated above, taking into consideration the change ofkinetics resulting from the change in physical conditions to which thepolypeptides are subjected. For instance, the pressure may be very high(up to 5000 Bar) when using an HPLC system when performing the method ofthe invention, and under such circumstances very rapid steps may beaccomplished and/or necessary. Further, as can be seen from theexamples, the temperature parameter is of importance, as some proteinsonly will refold properly at temperatures far from the physiologicalrange. Both temperature and pressure will of course have an effect onthe kinetics of the refolding procedure of the invention, and thereforethe above-indicated time intervals of renaturing and denaturing stepsare realistic boundaries for the many possible embodiments of theinvention.

For a given utilization of the method of the invention, the skilledperson will be able to determine suitable conditions based, e.g., onpreliminary experiments.

As indicated above, the polypeptide molecules are normally in contactwith a liquid phase during the denaturing and renaturing steps, theliquid phase normally being an aqueous phase. This means that anyreagents or auxiliary substances used in the method will normally bedissolved in the liquid phase, normally in an aqueous phase. However, ifconvenient, the liquid phase may also be constituted by one or moreorganic solvents.

In connection with renaturing of proteins, it is well known to use aso-called "chaperone" or "chaperone complex". Chaperones are a group ofrecently described proteins that show a common feature in theircapability of enhancing refolding of unfolded or partly unfoldedproteins. Often, the chaperones are multimolecular complexes. Many ofthese chaperones are heat-shock proteins, which means that in vivo, theyare serving as factors doing post traumatic "repair" on proteins thathave been destabilized by the trauma. To be able to fulfil thisfunction, chaperones tend to be more stable to traumatic events thanmany other proteins and protein complexes. While the method of theinvention does not depend on the use of a molecular chaperone or amolecular chaperone complex, it is, of course, possible to have asuitable molecular chaperone or molecular chaperone complex presentduring at least one renaturing step, and it may be preferred to have amolecular chaperone or a molecular chaperone complex present duringsubstantially all cycles.

As mentioned above, the polypeptide molecules are preferablysubstantially confined to an environment which allows changing orexchanging the liquid phase substantially without entraining thepolypeptide molecules. This can be achieved in a number of ways. Forinstance, the polypeptide molecules may be contained in a dialysisdevice, or they may be confined to one of the phases of a suitableliquid two-phase system. Such a suitable aqueous two phase system may,e.g., contain a polymer selected from the group consisting ofpolyethylene oxide (polyethylene glycol), polyvinyl acetate, dextran anddextran sulphate. In one interesting setup, one phase containspolyethylene oxide (polyethylene glycol) and the other phase containsdextran, whereby the polypeptide molecules will be confined to thedextran-containing phase.

Another way of avoiding entraining the polypeptide by having thepolypeptide molecules bound to a solid or semi-solid carrier, such as afilter surface, a hollow fibre or a beaded chromatographic medium, e.g.an agarose or polyacrylamide gel, a fibrous cellulose matrix or an HPLCor FPLC (Fast Performance Liquid Chromatography) matrix. As anothermeasure, the carrier may be a substance having molecules of such a sizethat the molecules with the polypeptide molecules bound thereto, whendissolved or dispersed in a liquid phase, can be retained by means of afilter, or the carrier may be a substance capable of forming micelles orparticipating in the formation of micelles allowing the liquid phase tobe changed or exchanged substantially without entraining the micelles.In cases where the micelle-forming components would tend to escape fromthe system as monomers, e.g. where they would be able to some extent topass an ultrafilter used in confining the system, this could becompensated for by replenishment with additional micelle-formingmonomers.

The carrier may also be a water-soluble polymer having molecules of asize which will substantially not be able to pass through the pores of afilter or other means used in confining the system.

The polypeptide molecules are suitable non-covalently adsorbed to thecarrier through a moiety having affinity to a component of the carrier.Such a moiety may, e.g., be a biotin group or an analogue thereof boundto an amino acid moiety of the polypeptide, the carrier having avidin,streptavidin or analogues thereof attached thereto so as to establish asystem with a strong affinity between the thus modified polypeptidemolecules and the thus modified carrier. It will be understood that theaffinity between the modified polypeptide and the modified carriershould be sufficiently stable so that the adsorption will besubstantially unaffected by the denaturing conditions; the removal ofthe polypeptide molecules from the carrier after the cycling should beperformed using specific cleaving, such as is explained in thefollowing.

An example of a suitable amino acid residue to which a biotinyl groupmay be bound is lysine.

One interesting way of introducing an amino acid carrying a moietyhaving affinity to the carrier is CPY synthesis. CPY (carboxy peptidaseY) is known to be capable of adding amino acid amide irrespective of thenature of the side chain of that amino acid amide.

In an interesting embodiment, the moiety having affinity to the carrieris the polypeptide segment SEQ ID NO: 47, in which case the carriersuitably comprises a Nitrilotriacetic Acid derivative (NTA) charged withNi⁺⁺ ions, for instance an NTA-agarose matrix which has been bathed in asolution comprising Ni⁺⁺.

An important aspect of the invention relates to the presence of suitablemeans in the polypeptide molecule preparing the molecule for latercleavage into two or more segments, wherein one segment is an authenticpolypeptide as defined above. Such combined polypeptide molecule (fusionpolypeptide molecules) may for this purpose comprise a polypeptidesegment which is capable of directing preferential cleavage by acleaving agent at a specific peptide bond. The polypeptide segment inquestion may be one which directs the cleavage as a result of theconformation of the segment which serves as a recognition site for thecleaving agent.

The cleavage directing polypeptide segment may for instance be capableof directing preferential cleavage at a specific peptide bond by acleaving agent selected from the group consisting of cyanogen bromide,hydroxylamine, iodosobenzoic acid and N-bromosuccinimide.

The cleavage-directing polypeptide segment may be one which is capableof directing preferential cleavage at a specific peptide bond by acleaving agent which is an enzyme and one such possible enzyme is bovineenterokinase or an analogue and/or homologue thereof.

In an important aspect of the invention, the cleaving agent is theenzyme bovine coagulation factor X_(a) or an analogue and/or homologuethereof (such analogues will be discussed in greater detail furtherbelow), and the polypeptide segment which directs preferential cleavageis a sequence which is substantially selectively recognized by thebovine coagulation factor X_(a) or an analogue and/or homologue thereof.Important such segments from the group consisting of SEQ ID NO: 38, SEQID NO: 40, SEQ ID NO: 41 and SEQ ID NO: 42.

An interesting feature of the invention is the possibility of maskingand unmasking polypeptide segments with respect to their ability todirect cleavage at a specific peptide bond, whereby it is obtained thatdifferent segments of the polypeptide can be cleaved at different stagesin the cycles.

Thus, when the polypeptide molecules comprise a polypeptide segmentwhich is in vitro-convertible into a derivatized polypeptide segmentcapable of directing preferential cleavage by a cleaving agent at aspecific peptide bond, a masking/unmasking effect as mentioned becomesavailable. An especially interesting version of this strategy is wherethe in vitro-convertible polypeptide segment is convertible into aderivatized polypeptide segment which is substantially selectivelyrecognized by the bovine coagulation factor X_(a) or an analogue and/orhomologue thereof.

It is contemplated that both cysteine and methionine residues can beconverted into modified residues, which modified residues make thesegments having amino acid sequences selected from the group consistingof SEQ ID NO: 43, SEQ ID NO: 11, SEQ ID NO: 45 and SEQ ID NO: 46 invitro converted into segments recognized by bovine coagulation factorX_(a) or an analogue and/or homologue thereof.

According to the invention, one possible solution involving the cysteineresidue is that a polypeptide segment with the amino acid sequence SEQID NO: 43 or SEQ ID NO: 44, is converted into a derivatized polypeptidewhich is substantially selectively recognized by bovine coagulationfactor X_(a), by reacting the cysteine residue with N(2-mercaptoethyl)morpholyl-2-thiopyridyl disulphide ormercaptothioacetate-2-thiopyridyl disulphide.

A possible strategy according to the invention involving methionine isthat a polypeptide segment with the amino acid sequence SEQ ID NO: 45 orSEQ ID NO: 46, is converted into a derivatized polypeptide, which issubstantially selectively recognized by bovine coagulation factor X_(a),by oxidation of the thioether moiety in the methionine side group to asulphoxide or sulphone derivative.

Preferred embodiments of the method according to the invention are thosewherein the cleavage-directing segments with the amino acid sequencesSEQ ID NO: 38, SEQ ID NO: 40, SEQ ID NO: 41 or SEQ ID NO: 42, or themasked cleavage-directing segments with the amino acid sequences SEQ IDNO: 43, SEQ ID NO: 44, SEQ ID NO: 45 and SEQ ID NO: 46 are linkedN-terminally to the authentic polypeptide, because then no furtherprocessing other than the selective cleaving is necessary in order toobtain the authentic polypeptide in solution. On the other hand, onepossible reason for linking the cleavage directing sequences at theC-terminal end of the authentic polypeptide would be that the correctfolding of the polypeptide molecules is dependent on a free N-terminalof the polypeptide molecules. In such a case, the part of the cleavingdirecting sequence remaining after cleaving can be removed by suitableuse of carboxypeptidases A and B.

The change of conditions during the transition period between the stepsmay according to the invention be accomplished by changing the chemicalcomposition of the liquid phase with which the polypeptide molecules arein contact. Thus, denaturing of the polypeptide molecules may beaccomplished by contacting the polypeptide molecules with a liquid phasein which at least one denaturing compound is dissolved, and renaturingof the polypeptide molecules is accomplished by contacting thepolypeptide molecules with a liquid phase which either contains at leastone dissolved denaturing compound in such a concentration that thecontact with the liquid phase will tend to renature rather than denaturethe ensemble of polypeptide molecules in their respective conformationstates resulting from the preceeding step, or contains substantially nodenaturing compound.

The expression "denaturing compound" refers to a compound which whenpresent as one of the solutes in a liquid phase comprising polypeptidemolecules may destabilize folded states of the polypeptide moleculesleading to partial or complete unfolding of the polypeptide chains. Thedenaturing effect exerted by a denaturing compound increases withincreasing concentration of the denaturing compound in the solution, butmay furthermore be enhanced or moderated due to the presence of othersolutes in the solution, or by changes in physical parameters, e.g.temperature of pressure.

As examples of suitable denaturing compounds to be used in the methodaccording to the invention may be mentioned urea, guanidine-HCl, di-C₁₋₆alkylformamides such as dimethylformamide and di-C₁₋₆ -alkylsulphones.

The liquid phase used in at least one of the denaturing steps and/or ina least one of the renaturing steps may according to the inventioncontain a least one disulphide-reshuffling system.

"Disulphide reshuffling systems" are redox systems which containmixtures of reducing and oxidating agents, the presence of whichfacilitate the breaking and making of disulphide bonds in a polypeptideor between polypeptides. Accordingly, "disulphide reshuffling agents" or"disulphide reshuffling compounds" are such reducing and oxidatingagents which facilitate the breaking and making of disulphide bonds in apolypeptide or between polypeptides. In an important aspect of theinvention, the disulphide-reshuffling system contained in the aqueousphase which is in contact with the proteins comprises as a disulphidereshuffling system a mixture of a mercaptan and its correspondingdisulphide compound.

As an example, all cysteine residues in the polypeptide molecules mayhave been converted to mixed disulphide products of either glutathione,thiocholine, mercaptoethanol or mercaptoacetic acid, during at least oneof the denaturing/renaturing cycles. Such a converted polypeptide istermed a "fully disulphide-blocked polypeptide or protein" and this termthus refers to a polypeptide or a protein in which cysteine residueshave been converted to a mixed-disulphide in which each cysteineresidues is disulphide-linked to a mercaptan, e.g. glutathione. Theconversion of the cysteine residues to mixed disulphide products may beaccomplished by reacting a fully denatured and fully reduced ensemble ofpolypeptide molecules with an excess of a reagent which is a high-energymixed disulphide compounds, such as aliphatic-aromatic disulphidecompounds, e.g. 2 thiopyridyl glutathionyl disulphide, or by any othersuitable method.

As examples of high-energy mixed disulphides, that is, mixed disulphideshaving a relatively unstable S--S bond) may be mentioned mixeddisulphides having the general formula: ##STR1## wherein R₁ is2-pyridyl, and each of R₂, R₃ and R₄ is hydrogen or an optionallysubstituted lower aromatic or aliphatic hydrogen group. Examples of suchmixed disulphides are glutathionyl-2-thiopyridyl disulphide, 2thiocholyl 2-thiopyridyl disulphide, 2-mercaptoethanol-2-thiopyridyldisulphide and mercaptoacetate-2-thiopyridyl disulphide.

In interesting embodiments, the disulphide-reshuffling system containsglutathione, 2-mercaptoethanol or thiocholine, each of which is inadmixture with its corresponding symmetrical disulphide.

The suitability of a given mixture of thiols for use as selectivereducing and/or disulphide-reshuffling system in a cyclicrefolding/reoxidation procedure for a specific protein product can bedirectly assayed by incubating ensembles of samples of a mixture offolded and misfolded protein with an array of thiol mixtures at severaldifferent concentrations of denaturant exerting weakly, intermediate orstrongly denaturing effects on the protein. Following incubation, thedisulphide topology in each sample is then locked by reaction with anexcess of thiol-blocking reagents (e.g. Iodoacetamide) before subjectingeach set of samples to SDS-PAGE under non-reducing conditions. Correctlydisulphide-bridged material and material in undesired covalenttopological states will appear in separate bands and will thereforeallow quantitative assessment of folding state of the protein at thetime of thiol-blocking, because only correctly unique disulphide-bondedtopoisomer may correspond to correctly folded protein present at the endof incubation with thiol/disulphide and denaturant agents. This set ofexperiments allows identification cation of the range of denaturantlevels at which a given thiol/disulphide reagent may be advantageouslyused as disulphide reshuffling agent, as revealed by preferentialreduction and reshuffling of wrong disulphide bonds and low tendency toreduce bonds in the fully folded protein. This reagent testing proceduremay be used as a general procedure for selecting advantageous reducingand/or thiol/disulphide reshuffling reagents. Example 12 demonstratesapplication of this analytical procedure to assess the suitability forselective reduction of misfolded forms of a model protein for 5 thiolreagents and thereby demonstrates the operability of the aboveprocedure.

It will be understood that the above-indicated procedure for selectingsuitable disulphide reshuffling systems may also be employed forselecting other compositions than mixtures of thiols. Any mixturecontaining suitable reducing/oxidating agents may be evaluated accordingto the above indicated procedure, and the composition of choice in themethod of the invention will be the one which shows the highest abilityto preferentially reduce incorrectly formed disulphide bridges.

Thus, a very important aspect of the invention is a method for proteinrefolding as described herein, wherein at least onedisulphide-reshuffling system contained in liquid phase in at least onerenaturing and/or denaturing step is one which is capable of reducingand/or reshuffling incorrectly formed disulphide bridges underconditions with respect to concentration of the denaturing agent atwhich unfolded and/or misfolded proteins are denatured and at whichthere is substantially no reduction and/or reshuffling of correctlyformed disulphide bridges.

An interesting embodiment of the invention is a method as describedabove, wherein a disulphide reshuffling system is used in at least onedenaturing/renaturing step and resulting in a ratio between the relativeamount of reduced/reshuffled initially incorrectly formed disulphidebridges and the relative amount of reduced/reshuffled initiallycorrectly formed disulphide bridges of at least 1.05. The ratio willpreferably be higher, such as 1.1, 1.5, 2.0, 3.0, 5.0, 10, 100, 1000,but even higher ratios are realistic and are thus especially preferredaccording to the invention.

By the terms "initially incorrectly/correctly" with respect to the formof disulphide bridges is meant the disulphide bridging topology justbefore the disulphide reshuffling system exerts its effects.

It will be understood that the ratio has to be greater than 1 in orderto allow the net formation of correctly formed disulphide bridges in aprotein sample. Normally the ratio should be as high as possible, buteven ratios which are marginally above 1 will allow the net formation ofcorrectly formed disulphide bridges in the method of the invention, theimportant parameter in ensuring a high yield being the number ofdenaturing/renaturing cycles. Ratios just above one require that manycycles are completed before a substantive yield of correctly formeddisulphide bridges is achieved, whereas high ratios only require alimited number of cycles.

In cases where only one disulphide reshuffling system is going to beemployed such a disulphide-reshuffling system may according to theinvention be selected by

1) incubating samples of folded and misfolded protein of the same aminoacid sequence as the protein to be processed in the method of theinvention with an array of disulphide-reshuffling systems at severaldifferent concentrations of a chosen denaturing agent,

2) assessing at each of the different concentrations of denaturing agentthe ability of each of the disulphide reshuffling systems to reduceand/or reshuffle initially incorrectly formed disulphide bridges withoutsubstantially reducing and/or reshuffling initially correctly formeddisulphide bridges as assessed by calculating the ratio between therelative amount of reduced/reshuffled initially incorrectly formeddisulphide bridges and the relative amount of reduced/reshuffledinitially correctly formed disulphide bridges, and

3) selecting as the disulphide reshuffling system X, thedisulphide-reshuffling system which exhibit the capability of reducinginitially incorrectly formed disulphide bridges without substantiallyreducing and/or reshuffling initially correctly formed disulphidebridges in the widest range of concentrations of the chosen denaturingagent.

Alternatively more than one disulphide-reshuffling system may beemployed, for instance in different cycles in the cyclic refoldingmethod of the invention, but also simultaneously in the same cycles.This will e.g. be the case when it is likely or has been established bye.g. the method outlined above that the overall yield of correctlyfolded protein with correct disulphide bridging topology will be higherif using different disulphide-reshuffling systems in the method of theinvention.

In order to calculate the above-indicated the ratio between the relativeamount of reduced/reshuffled initially incorrectly formed disulphidebridges and the relative amount of reduced/reshuffled initiallycorrectly formed disulphide bridges, the following method may beemployed: to the initial mixture of reactants in step 1) is added aknown amount of radioactively-labelled correctly folded protein. Whenthe amounts of correctly and incorrectly folded protein are assessed instep 2) (for instance by non-reducing SDS-PAGE) the content ofradioactivity in the correctly folded protein fraction is determined aswell. Thereby an assessment of the now incorrectly folded (but initiallycorrectly folded) protein can be determined in parallel with thedetermination of the total distribution of correctly/incorrectly foldedprotein. The above-mentioned ratio can thus be calculated as ##EQU1##wherein C₁ and C₂ are the initial and the final amounts of correctlyfolded proteins, respectively, U₁ is the amount of initially incorrectlyfolded protein, and A₁ and A₂ are the radioactivity in the initialcorrectly folded protein fraction and in the final correctly foldedprotein, respectively.

In addition to the denaturing means mentioned above, denaturing may alsobe achieved or enhanced by decreasing pH of the liquid phase, or byincreasing pH of the liquid phase.

The polarity of the liquid phase used in the renaturing may according tothe invention have been modified by the addition of a salt, a polymerand/or a hydrofluoro compound such as trifluoroethanol.

According to the invention, the denaturing and renaturing of thepolypeptide molecules may also be accomplished by direct changes inphysical parameters to which the polypeptide molecules are exposed, suchas temperature or pressure, or these measures may be utilized to enhanceor moderate the denaturing or renaturing resulting from the othermeasures mentioned above.

However, it will be understood that a most important practicalembodiment of the method is performed by accomplishing chemical changesin the liquid phase by changing between a denaturing solution B and arenaturing solution A. In this case, the concentration of one or moredenaturing compounds in B will often be adjusted after each cycle, andas one important example, the concentration of one or more denaturingcompounds in B will be decremented after each cycle, but in anotherimportant embodiment, the concentration of one or more denaturingcompounds in medium B is kept constant in each cycle.

This embodiment of the invention, wherein the concentration ofdenaturing compound(s) medium B is kept constant, is especiallyinteresting when the most productive phase or the cycling process (withrespect to correctly folded protein) has been identified, and largescale production of correctly folded protein is desired. As will beunderstood, the preferred concentration(s) of denaturing compound(s) ofmedium B in this embodiment is the concentration(s) which has beenestablished to ensure maximum productivity in the cyclic processaccording to the invention.

The polypeptide molecules of the ensemble which is subjected to themethod of the invention normally have a length of at least 25 amino acidresidues, such as at least 30 amino acid residues or at least 50 aminoacid residues. On the other hand, the polypeptide molecules of theensemble normally have a length of at most 5000 amino acid residues,such as at most 2000 amino acid residues or at most 1000 or 800 aminoacid residues.

As can be seen from example 10, the method of the invention has madepossible the production of correctly folded diabody molecules (diabodiesare described in Holliger et al., 1993).

An important aspect of the invention therefore relates to a method forproducing correctly folded diabody molecules, wherein an initialensemble of polypeptide molecules comprising unfolded and/or misfoldedpolypeptides having amino acid sequences identical to the amino acidsequences of monomer fragments of diabody molecules is subjected to aseries of at least two successive cycles, each of which comprises asequence of

1) at least one denaturing step involving conditions exerting adenaturing influence on the polypeptide molecules of the ensemblefollowed by

2) at least one renaturing step involving conditions having a renaturinginfluence on the polypeptide molecules having conformations resultingfrom the preceding step,

the series of cycles being so adapted that a substantial fraction of theinitial ensemble of polypeptide molecules is converted to a fraction ofcorrectly folded diabody molecules.

Such a method for the correct folding of diabodies can be envisaged inany of the above-mentioned scenarios and aspects of the refolding methodof the invention, that is, with respect to the choice ofphysical/chemical conditions as well as cycling schedules. However, animportant aspect of the method for correct folding of diabodies is amethod as the above-identified, wherein the polypeptide molecules are incontact with a liquid phase containing at least one disulphidereshuffling system in at least one denaturing or renaturing step. Thepreferred denaturing agent to be used in such a liquid phase is urea,and the preferred disulphide reshuffling system comprises glutathione asthe main reducing agent.

A particular aspect of the invention relates to a polypeptide which is aproenzyme of a serine protease, but is different from any naturallyoccurring serine protease and, in particular, has an amino acid sequencedifferent from that of bovine coagulation factor X (ProteinIdentification Resource (PIR), National Biomedical Research Foundation,Georgetown University, Medical Center, U.S.A., entry: P1;EXBO) and whichcan be proteolytically activated to generate the active serine proteaseby incubation of a solution of the polypeptide in a non-denaturingbuffer with a substance that cleaves the polypeptide to liberate a newN-terminal residue,

the substrate specificity of the serine protease being identical to orbetter than that of bovine blood coagulation factor X_(a), as assessedby each of the ratios (k(I)/k(V) and k(III)/k(V) between cleavage rateagainst each of the substrates I and III:

T: Benzoyl-Val-Gly-Arg-paranitroanilide,

III: Tosyl-Gly-Pro-Arg-paranitroanilide,

versus that against the substrate

V: Benzoyl-Ile-Glu-Gly-Arg-paranitroanilide

at 20° C., pH=8 in a buffer consisting of 50 mM Tris, 100 mM NaCl, 1 mMCaCl₂, being identical to or lower than the corresponding ratiodetermined for bovine coagulation factor X_(a) which is substantiallyfree from contaminating proteases.

The characterization of the above-identified new polypeptides as serineproteases is in accordance with the normal nomenclatural use of the termserine proteases. As is well known in the art, serine proteases areenzymes which are believed to have a catalytic system consisting of anactive site serine which is aligned with a histidine residue, and it isbelieved that the activation of the enzymes from the correspondingproenzymes is based on the liberation of a new N-terminal residue, theα-amino group of which is capable of repositioning within thepolypeptide structure for form a salt bridge to an aspartic acid residuepreceding an active-site serine residue, thereby forming the catalyticsite characteristic of serine proteases.

The "artificial" serine proteases defined above are extremely valuablepolypeptide cleaving tools for use in the method of the invention and inother methods where it is decisive to have a cleaving tool which willselectively cleave proteins, even large folded proteins. Analogously tobovine coagulation factor X_(a), the above-defined artificial serineproteases in activated form are capable of selectively recognizing thecleaving-directing polypeptide segment SEQ ID NO: 38, but in contrast tobovine coagulation factor X_(a), they can be established with such aminoacid sequences that they can be readily produced using recombinant DNAtechniques. Thus, the preferred artificial serine proteases of theinvention are ones which have amino acid sequences allowing theirsynthesis by recombinant DNA techniques, in particular in prokaryotecells such as E. coli. As will appear from the following discussion andthe examples, the artificial serine proteases of the invention, whenproduced in a prokaryote, may be given an enzymatically activeconformation, in which the catalytically active domains are suitableexposed, by cycling according to the method of the present invention.

The quantitative test for selectivity of the artificial serine proteasesinvolves determination of the cleavage rate, k, determined as theinitial slope of a curve of absorption of light at 405 nm (absorptionmaximum of free paranitroaniline) versus time at 20° C.

Expressed quantitatively, the selectivity of the artificial serineproteases should be characterized by the value of (k(I)/k(V) being atmost 0.06, and the value k(III)/k(V) being at most 0.5. It is preferredthat (k(I)/k(V) is at most 0.05 and (k(I)/k(V) is at most 0.4, and morepreferred that (k(I)/k(V) is at most 0.04 and k(III)/k(V) is at most0.15.

A more comprehensive specificity characterization involves further modelsubstrates: thus, the substrate specificity could be assessed to beidentical to or better than that of bovine blood coagulation factorX_(a) by each of the ratios (k(I)/k(V), k(II)/k(V), k(III)/k(V) andk(IV)/k(V)) between cleavage rate against each of the substrates I-IV:

I: Benzoyl-Val-Gly-Arg-paranitroanilide,

II: Tosyl Gly Pro Lys paranitroanilide,

III: Tosyl-Gly-Pro-Arg-paranitroanilide,

IV: (d,l)Val-Leu-Arg-paranitroanilide

versus that against the substrate

V: Benzoyl-Ile-Glu-Gly-Arg-paranitroanilide

at 20° C. pH=8 in a buffer consisting of 50 mM Tris, 100 mM NaCl, 1 mMCaCl₂, being identical to or lower than the corresponding ratiodetermined for bovine coagulation factor X_(a) which is substantiallyfree from contaminating proteases.

Within this characterization, (k(I)/k(V) should be at most 0.06,k(II)/k(V) should be at most 0.03, k(III)/k(V) should be at most 0.5,and k(IV)/k(V) should be at most 0.01, and it is preferred that(k(I)/k(V) is at most 0.05, k(II)/k(V) is at most 0.025, k(III)/k(V) isat most 0.4, and k(IV)/k(V) is at most 0.008, and more preferred that(k(I)/k(V) is at most 0.04, k(II)/k(V) is at most 0.015, k(III)/k(V) isat most 0.15, and k(IV)/k(V)) is at most 0.005.

The serine protease type polypeptide as defined above will normally havea molecular weight, M_(r), of at most 70,000 and at least 15,000.

One such novel polypeptide according to the invention has the amino acidsequence SEQ ID NO: 2 or is an analogue and/or homologue thereof. Otherimportant embodiments of the polypeptide of the invention have an aminoacid sequence which is a subsequence of SEQ ID NO: 2 or an analogueand/or homologue of such a subsequence.

By the use of the term "an analogue of a polypeptide encoded by the DNAsequence" or "an analogue of a polypeptide having the amino acidsequence" is meant any polypeptide which is capable of performing asbovine coagulation factor X_(a) is the tests mentioned above. Thus,included are also polypeptides from different sources, such as differentmammals or vertebrates, which vary e.g. to a certain extent in the aminoacid composition, or the post-translational modifications e.g.glycosylation or phosphorylation, as compared to the artificial serineprotease described in the examples.

The term "analogue" is thus used in the present context to indicate aprotein or polypeptide of a similar amino acid composition or sequenceas the characteristic amino acid sequence SEQ ID NO: 2 derived from anartificial serine protease as described in Example 5, allowing for minorvariations that alter the amino acid sequence e.g. deletions, sitedirected mutations, insertions of extra amino acids, or combinationsthereof, to generate artificial serine protease analogues.

Therefore, in the present description and claims, an analogue (of apolypeptide) designates a variation of the polypeptide in which one orseveral amino acids may have been deleted or exchanged, and/or aminoacids may have been introduced, provided the enzymatic activity with theabove-defined specificity is retained, as can be assessed as describedabove.

With respect to homology, an analogue of a polypeptide according to theinvention may have a sequence homology at the polypeptide level of atleast 60% identity compared to the sequence of a fragment of SEQ ID NO:2, allowing for deletions and/or insertions of at most 50 amino acidresidues.

Such polypeptide sequences or analogues thereof which has homology of atleast 60% with the polypeptide shown in SEQ ID NO: 2 encoded for by theDNA sequence of the invention SEQ ID NO: 1 or analogues and/orhomologues thereof, constitute an important embodiment of thisinvention.

By the term "sequence homology" is meant the identity in sequence ofeither the amino acids in segments of two or more amino acids in a aminoacid sequence, or the nucleotides is segments of two or more nucleotidesin a nucleotide sequence. With respect to polypeptides, the terms arethus intended to mean a homology between the amino acids in questionbetween which the homology is to be established, in the match withrespect to identity and position of the amino acids of the polypeptides.

The term "homologous" is thus used here to illustrate the degree ofidentity between the amino acid sequence of a given polypeptide and theamino acid sequence shown in SEQ ID NO: 2. The amino acid sequence to becompared with the amino acid sequence shown in SEQ ID NO: 2 may bededuced from a nucleotide sequence such as a DNA or RNA sequence, e.g.obtained by hybridization as defined in the following, or may beobtained by conventional amino acid sequencing methods.

Another embodiment relates to a polypeptide having an amino acidsequence from which a consecutive string of 20 amino acids is homologousto a degree of at least 40% with a string of amino acids of the samelength selected from the amino acid sequence shown in SEQ ID NO: 2.

One serine protease polypeptide according to the invention has the aminoacid sequence of SEQ ID NO: 2, residues 82-484, or is an analogue and/orhomologue thereof. Another serine protease polypeptide according to theinvention has the amino acid sequence of SEQ ID NO: 2, residues 166-484,or is an analogue and/or homologue thereof.

A number of modifications of the sequences shown herein are particularlyinteresting. The insertion of the cleaving directing sequences SEQ IDNO: 38 or 40-42 instead of residues 230-233 in SEQ ID NO: 2, combinedwith exchange of cysteine residue 245 by preferably Gly, Ser or Arg inSEQ ID NO: 2. Another interesting possibility is insertion of SEQ ID NO:38 or 40-42 instead of residues 179-182 in SEQ ID NO: 2. Quitegenerally, in any of the artificial serine proteases defined above,replacement of the cleaving sequence corresponding to residues 230-233in SEQ ID NO: 2 with one of the cleavage-directing sequences definedabove will give rise to extremely useful cleaving enzymes for use in themethod according to the invention, in that these can be selectively andvery efficiently cleaved by enzymes having the specific enzymaticactivity of bovine coagulation factor X_(a), and thus by artificialserine proteases as defined above, including by molecules identical tothemselves. The latter fact means that artificial serine proteasesmodified by such insertion of the specific cleaving-directing sequencescan be extremely effectively activated, as the first molecules cleavedand activated will be able to cleave other molecules, thus starting achain reaction.

As mentioned above, it is a most important feature that the artificialserine proteases can be produced by recombinant DNA techniques, andhence, another important embodiment of the invention relates to anucleic acid fragment capable of encoding a polypeptide according asdefined above, in particular a DNA fragment which is capable of encodingan artificial serine protease polypeptide as defined above.

In one of its aspects, the invention relates to a nucleotide sequenceencoding a polypeptide of the invention as defined above. In particular,the invention relates to a nucleotide sequence having the nucleotidesequence shown in the DNA sequence SEQ ID NO: 1 or an analogue thereofwhich has a homology with any of the DNA sequences shown in SEQ ID NO: 1of at least 60%, and/or encodes a polypeptide, the amino acid sequenceof which is at least 60% homologous with the amino acid sequences shownin SEQ ID NO: 2.

Generally, only coding regions are used when comparing nucleotidesequences in order to determine their internal homology.

The term "analogue" with regard to the DNA fragments of the invention isintended to indicate a nucleotide sequence which encodes a polypeptideidentical or substantially identical to the polypeptide encoded by a DNAfragment of the invention. It is well known that the same amino acid maybe encoded by various codons, the codon usage being related, inter alia,to the preference of the organisms in question expressing the nucleotidesequence. Thus, one or more nucleotides or codons of the DNA fragment ofthe invention may be exchanged by others which, when expressed, resultin a polypeptide identical or substantially identical to the polypeptideencoded by the DNA fragment in question.

Furthermore, the term "analogue" is intended to allow for variations inthe sequence such as substitution, insertion (including introns),addition and rearrangement of one or more nucleotides, which variationsdo not have any substantially effect on the polypeptide encoded by theDNA fragment.

Thus, within the scope of the present invention is a modified nucleotidesequence which differs from the DNA sequence shown in SEQ ID NO: 1 inthat at least one nucleotide has been substituted, added, inserted,deleted and/or rearranged.

The term "substitution" is intended to mean the replacement of one ormore nucleotides in the full nucleotide sequence with one or moredifferent nucleotides, "addition" is understood to mean the addition ofone or more nucleotides at either end of the full nucleotide sequence,"insertion" is intended to mean the introduction of one or morenucleotides within the full nucleotide sequence, "deletion" is intendedto indicate that one or more nucleotides have been deleted from the fullnucleotide sequence whether at either end of the sequence of at anysuitable point within it, and "rearrangement" is intended to mean thattwo or more nucleotide residues have been exchanged within the DNA orpolypeptide sequence, respectively. The DNA fragment may, however, alsobe modified by mutagenesis either before or after inserting it in theorganism. The DNA or protein sequence of the invention may be modifiedin such a way that it does not lost any of its biophysical, biochemicalor biological properties, or part of such properties (one and/or all) orall of such properties (one and/or all).

An example of a specific analogue of the DNA sequence of the inventionis a DNA sequence which comprises the DNA sequence shown in SEQ ID NO: 1and particularly adapted for expression in E. coli. This DNA sequence isone which, when inserted in E. coli together with suitable regulatorysequences, results in the expression of a polypeptide havingsubstantially the amino acid sequence shown in SEQ ID NO: 2. Thus, thisDNA sequence comprises specific codons recognized by E. coli.

The terms "fragment", "sequence", "homologue" and "analogue", as used inthe present specification and claims with respect to fragments,sequences, homologues and analogues according to the invention should ofcourse be understood as not comprising these phenomena in their naturalenvironment, but rather, e.g., in isolated, purified, in vitro orrecombinant form.

One embodiment of the nucleic acid fragment according to the inventionis a nucleic acid fragment as defined above in which at least 60% of thecoding triplets encode the same amino acids as a nucleic acid fragmentof the nucleic acid which encodes bovine coagulation factor X, allowingfor insertions and/or deletions of at most 150 nucleotides. An exampleof such a nucleic acid fragment is SEQ ID NO: 1, nucleotides 76-1527,and analogues and/or homologues there of. Another example is SEQ ID NO:1, nucleotides 319-1527, and analogues and/or homologues thereof. Stillanother example is SEQ ID NO: 1, nucleotides 571-1527, and analoguesand/or homologues thereof.

The DNA fragment described above and constituting an important aspect ofthe invention may be obtained directly from the genomic DNA or byisolating mRNA and converting it into the corresponding DNA sequence byusing reverse transcriptase, thereby producing a cDNA. When obtainingthe DNA fragment from genomic DNA, it is derived directly by screeningfor genomic sequences as is well known for the person skilled in theart. It can be accomplished by hybridization to a DNA probe designed onthe basis of knowledge of the sequences of the invention, or thesequence information obtained by amino acid sequencing of a purifiedserine protease. When the DNA is of complementary DNA (cDNA) origin, itmay be obtained by preparing a cDNA library with mRNA from cellscontaining an artificial serine protease. Hybridization can beaccomplished by a DNA probe designed on the basis of knowledge of thecDNA sequence, or the sequence information obtained by amino acidsequencing of a purified artificial serine protease.

The DNA fragment of the invention or an analogue and/or homologuethereof of the invention can be replicated by fusing it with a vectorand inserting the complex into a suitable microorganism or a mammaliancell line. Alternatively, the DNA fragment can be manufactured usingchemical synthesis. Also, polymerase chain reaction (PCR) primers can besynthesized based on the nucleotide sequence shown in SEQ ID NO: 1.These primers can then be used to amplify the whole or a part of asequence encoding an artificial serine protease polypeptide.

Suitable polypeptides of the invention can be produced using recombinantDNA technology. More specifically, the polypeptides may be produced by amethod which comprises culturing or breeding an organism carrying theDNA sequence shown in SEQ ID NO: 1 or an analogue and/or homologuethereof of the invention under conditions leading to expression of saidDNA fragment, and subsequently recovering the expressed polypeptide fromthe said organism.

The organism which is used for the production of the polypeptide may bea higher organism, e.g. an animal, or a lower organism, e.g. amicroorganism. Irrespective of the type of organism used, the DNAfragment of the invention (described above) should be introduced in theorganism either directly or with the help of a suitable vector.Alternatively, the polypeptides may be produced in the mammalian celllines by introducing the DNA fragment or an analogue and/or homologuethereof of the invention either directly or with the help of anexpression vector.

The DNA fragment of the invention can also be cloned in a suitablestable expression vector and then put into a suitable cell line. Thecells expressing the desired polypeptides are then selected using theconditions suitable for the vector and the cell line used. The selectedcells are then grown further and form a very important and continuoussource of the desired polypeptides.

Thus, another aspect of the invention relates to an expression systemcomprising a nucleic acid fragment as defined above and encoding anartificial serine protease polypeptide as defined above, the systemcomprising a 5' flanking sequence capable of mediating expression ofsaid nucleic acid fragment. The expression system may be a replicableexpression vector carrying the nucleic acid fragment, which vector iscapable of replicating in a host organism or a cell line; the vectormay, e.g., be a plasmid, phage, cosmid, mini-chromosome or virus; thevector may be one which, when introduced in a host cell, is integratedin the host cell genome.

Another aspect of the invention relates to an organism which carries andis capable of replicating the nucleic acid fragment as defined above.The organism may be a microorganism such as a bacterium, a yeast, aprotozoan, or a cell derived from a multicellular organism such as afungus, an insect cell, a plant cell, a mammalian cell or a cell line.Particularly intersecting host organisms are microorganisms such as abacterium of the genus Escherichia, Racillus or Salmonella.

A further aspect of the invention relates to a method of producing anartificial serine protease polypeptide as defined above, comprising thefollowing steps of:

1. inserting a nucleic acid fragment as defined above in an expressionvector,

2. transforming a host organism as defined above with the vectorproduced in step a,

3. culturing the host organism produced in step b to express thepolypeptide,

4. harvesting the polypeptide,

5. optionally subjecting the polypeptide to post-translationalmodification,

6. if necessary subjecting the polypeptide to the denaturing/renaturingcycling method according to the present invention, and

7. optionally subjecting the polypeptide to further modification toobtain an authentic polypeptide as defined above.

Further modifications of the polypeptides may for instance beaccomplished by subjecting the polypeptide molecules to carboxypeptidaseA or B, whereby selected amino acid residues may be removed from theC-terminus of the polypeptide molecules. This is desirable undercircumstances wherein the optimal folding of the authentic polypeptidemolecules only is achieved when the N-terminus is free and the cleavagedirecting polypeptide (such as SEQ ID NO: 37) thus is placedC-terminally of the authentic polypeptide. As is known, carboxypeptidaseB cleaves sequentially from the C-terminus, and only cleaves off basicamino acids, whereas carboxypeptidase A cleaves off non-basic aminoacids. By carefully designing which residue is adjoined C-terminally tothe authentic polypeptide it is possible to ensure that all but theauthentic polypeptide is cleaved by the carboxypeptidases. If theC-terminus of the authentic polypeptide is a basic amino acid residueone should assure that the C-terminally linked residue which is to beremoved is non-basic and vice versa. If one knows the sequence of theamino acid residues from the C-terminus to the C-terminus of theauthentic polypeptide it is possible to alternate between treatmentswith the two carboxypeptidases until only the naked, authenticpolypeptide is left. A practical embodiment would be to use immobilizedcarboxypeptidases.

The polypeptide produced may be isolated by a method comprising one ormore steps like affinity chromatography using immobilized polypeptide orantibodies reactive with said polypeptide and/or other chromatographicand electrophoretic procedures.

Also, it will be understood that a polypeptide of the invention may beprepared by the well known methods of liquid or solid phase peptidesynthesis utilizing the successive coupling of the individual aminoacids of the polypeptide sequence. Alternatively, the polypeptides canbe synthesized by the coupling of individual amino acids formingfragments of the polypeptide sequence which are later coupled so as toresult in the desired polypeptide. These methods thus constitute anotherinteresting aspect of the invention.

The invention also relates to the use of an artificial serine proteasepolypeptide as defined above for cleaving polypeptides at the cleavagesite for bovine coagulation factor X_(a), the cleavage site having theamino acid sequence selected from the group consisting of SEQ ID NO: 38,SEQ ID NO: 40, SEQ ID NO: 41 and SEQ ID NO: 42, and to the use of a anartificial serine protease polypeptide as defined above for cleavingpolypeptides at the cleavage site for bovine coagulation factor X_(a),the cleavage site having a modified version of the amino acid sequenceselected from the group of SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO: 45and SEQ ID NO: 46, which has been converted to a cleavable form asdescribed further above.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1: Schematic representation of segment of a cyclicdenaturation/renaturation time programme.

Solvent composition is expressed in terms of a binary mixture of anon-denaturing `buffer A` and a denaturing `buffer B` in terms ofrelative content of buffer B. Three consecutive cycles are represented,each consisting of a renaturation phase `F` and a denaturation phase`D`. Changes in level of denaturing power of the solvent mixture duringdenaturation phases in consecutive cycles are denoted `k`.

FIG. 2: Construction of the expression plasmids pT₇ H₆ FX-hβ2m and pT₇H₆ FX-mβ2m.

The amplified DNA fragments containing the reading frames of human- andmurine β₂ -microglobulin from amino acid residues Ile₁ to Met₉₉, fusedat the 5'-end to the nucleotide sequences encoding the FX_(a) cleavagesite (SEQ ID NO: 37), were cut with the restriction endonucleases Bam HIand Hind III (purchased from Boehringer, Germany) and ligated with T₄DNA ligase (purchased from Boehringer, Germany) into Bam HI and Hind IIIcut pT₇ H₆ using standard procedures.

FIGS. 3a-3b: Amino acid sequences of human- and murine β₂-microglobulin.

A: Predicted amino acid sequence of the full length reading frameencoding human β₂ -microglobulin (SEQ ID NO: 49). Amino acid residue one(Ile) in the processed mature protein is indicated. B: Predicted aminoacid sequence of the full length reading frame encoding murine β₂-microglobulin (SEQ ID NO: 50). Amino acid residue one (Ile) in theprocessed mature protein is indicated.

FIG. 4: Construction of the expression plasmid pT₇ H₆ FX-hGH.

The amplified DNA fragment containing the reading frame of human GrowthHormone from amino acid residues Phe₁ to Phe₁₉₁, fused at the 5'-end tothe nucleotide sequence encoding the FX_(a) cleavage site IEGR (SEQ IDNO: 38), was cut with the restriction endonucleases Bam HI and Hind III(purchased from Boehringer, Germany) and ligated with T₄ DNA ligase(purchased from Boehringer, Germany) into Bam HI and Hind III cut pT₇ H₆using standard procedures.

FIG. 5: Amino acid sequence of human Growth Hormone (Somatotropin).

The predicted amino acid sequence of the full length reading frameencoding human Growth Hormone (SEQ ID NO: 51). The first Amino acidresidue in the processed mature protein (Phe₁) is indicated.

FIG. 6: Construction of the plasmids pT₇ H₆ FX-#1, #2, and #3 expressingamino acid residue no. 20 (Ala) to 109 (Arg), amino acid residue no 20(Ala) to 190 (Ala), and amino acid residue no. 20 (Ala) to 521 (Lys) ofthe human α₂ -Macroglobulin Receptor Protein (α₂ MR) (SEQ ID NO: 52).

The amplified DNA fragments derived from the reading frame of the α₂ MRfrom #1: amino acid residue no. 20 (Ala) to 109 (Arg), #2: amino acidresidue no. 20 (Ala) to 190 (Ala), and #3: amino acid residue no. 20(Ala) to 521 (Lys), fused at the 5'-end to the nucleotide sequenceencoding the FX_(a) cleavage site IEGR (SEQ ID NO: 38), were cut withthe restriction endonucleases Bam HI and Hind III (purchased fromBoehringer, Germany) and ligated with T₄ DNA ligase (purchased fromBoehringer, Germany) into Bam HI and Hind III cut pT₇ H₆ using standardprocedures.

FIG. 7: Construction of the plasmids pLcIIMLCH₆ FX-#4, #5, and #6expressing amino acid residue no. 803 (Gly) to 1265 (Asp), amino acidresidue no. 849 (Val) to 1184 (Gln), and amino acid residue no. 1184(Gln) to 1582 (Lys) of the human α₂ -Macroglobulin Receptor Protein (α₂MR) (SEQ ID NO: 52).

The amplified DNA fragments derived from the reading frame of the α₂ MRfrom #4: amino acid residue no. 803 (Gly) to 1265 (Asp), #5: amino acidresidue no. 849 (Val) to 1184 (Gln), and #6: amino acid residue no. 1184(Gln) to 1582 (Lys), fused at the 5'-end to the nucleotide sequenceencoding the FX_(a) cleavage site IEGR (SEQ ID NO: 38), were cut withthe restriction endonucleases Bam HI or Bcl and Hind III (purchased fromBoehringer, Germany) and ligated with T₄ DNA ligase (purchased fromBoehringer, Germany) into Bam HI and Hind III cut pLcIIMLCH₆ FX usingstandard procedures.

FIG. 8: Construction of the plasmids pLcIIMLCH₆ FX-#7, #8, and #9expressing amino acid residue no. 803 (Gly) to 1582 (Lys), amino acidresidue no. 2519 (Ala) to 2941 (Ile), and amino acid residue no. 3331(Val) to 3778 (Ile) of the human α₂ -Macroglobulin Receptor Protein (α₂MR) (SEQ ID NO: 52).

The amplified DNA fragments derived from the reading frame of the α₂ MRfrom #7: amino acid residue no. 803 (Gly) to 1582 (Lys), #8: amino acidresidue no. 2519 (Ala) to 2941 (Ile), and #9: amino acid residue no.3331 (Val) to 3778 (Ile), fused at the 5'-end to the nucleotide sequenceencoding the FX_(a) cleavage site IEGR (SEQ ID NO: 38), were cut withthe restriction endonucleases Bam HI and Hind III (purchased fromBoehringer, Germany) and ligated with T₄ D NA ligase (purchased fromBoehringer, Germany) into Bam HI and Hind III cut pLcIIMLCH₆ FX usingstandard procedures.

FIGS. 9a and 9b.: Amino acid sequence of human α₂ -MacroglobulinReceptor Protein (α₂ MR) (SEQ ID NO: 52).

The predicted amino acid sequence of the full length reading frameencoding the α₂ MR. Amino acid residues present in the recombinantproteins as N- or C-terminal residues are identified by their numbersabove the α₂ MR sequence.

FIG. 10: Construction of the expression plasmid pLcIIMLCH₆ FX-FXΔγ.

The amplified DNA fragment containing the reading frame of bovine bloodcoagulation Factor X from amino acid residue Ser₈₂ to Trp₄₈₄, (FXΔγ)fused at the 5'-end to the nucleotide sequence encoding the FX_(a)cleavage site IEGR (SEQ ID NO: 38), was cut with the restrictionendonucleases Bam HI and Hind III (purchased from Boehringer, Germany)and ligated with T₄ DNA ligase (purchased from Boehringer, Germany) intoBam HI and Hind III cut pLcIIMLCH₆ FX using standard procedures.

FIG. 11: Amino acid sequence of bovine blood coagulation Factor X (FX).

The predicted amino acid sequence of the full length reading frameencoding bovine FX (SEQ ID NO: 53). The N-terminal amino acid residueSer₈₂ and the C terminal Trp₄₈₄ residue in the FXΔγ construct areidentified.

FIG. 12: Construction of the expression plasmid pLcIIMLCH₆ FX-K1.

The amplified DNA fragment containing the reading frame of humanplasminogen kringle 1 (K1) from amino acid residue Ser₈₂ to Glu₁₆₂(numbering as in "Glu"-plasminogen), fused at the 5'-end to thenucleotide sequence encoding the FX_(a) cleavage site IEGR (SEQ ID NO:38), was cut with the restriction endonucleases Bam HI and Hind III(purchased from Boehringer, Germany) and ligated with T₄ DNA ligase(purchased from Boehringer, Germany) into Bam HI and Hind III cutpLcIIMLCH₆ FX using standard procedures.

FIG. 13: Construction of the expression plasmid pLcIIH₆ FX-K4.

The amplified DNA fragment containing the reading frame of humanplasminogen kringle 4 (K4) from amino acid residue Val₃₅₄ to Ala₄₃₉(numbering as in "Glu"-plasminogen), fused at the 5'-end to thenucleotide sequence encoding the FX_(a) cleavage site IEGR (SEQ ID NO:38), was cut with the restriction endonucleases Bam HI and Hind III(purchased from Boehringer, Germany) and ligated with T₄ DNA ligase(purchased from Boehringer, Germany) into Bam HI and Hind III cutpLcIIH₆ FX using standard procedures.

FIG. 14: Amino acid sequence of human "Glu"--Plasminogen (SEQ ID NO:54). The N- and C-terminal amino acid residues in the K1 and K4constructs are identified by their numbers in the sequence.

FIG. 15: SDS-PAGE analysis of production and in vitro folding ofrecombinant human β₂ -microglobulin.

Lane 1: Crude protein extract before application to the Ni²⁺ NTA-agarosecolumn (reduced sample).

Lane 2: Column flow-through during application of the crude proteinextract onto the Ni²⁺ NTA-agarose column (reduced sample)

Lane 3: Human β₂ -microglobulin eluted from the Ni²⁺ NTA-agarose columnafter the cyclic folding procedure by the non-denaturing elution buffer(reduced sample).

Lane 4: Protein markers (Pharmacia, Sweden): From top of gel; 94 kDa, 67kDa, 43 kDa, 30 kDa, 20.1 kDa, and 14.4 kDa (reduced sample)

Lane 5: Same as lane 3 (non-reduced sample)

Lane 6: Recombinant human β₂ -microglobulin after FX_(a) cleavage andfinal purification (non-reduced sample).

FIG. 16: SDS-PAGE analysis of in vitro folding of recombinant humanGrowth Hormone; hGH (Somatotropin).

Lane 1: Protein markers (Pharmacia, Sweden): From top of gel; 94 kDa, 67kDa, 43 kDa, 30 kDa, 20.1 kDa, and 14.4 kDa (reduced sample)

Lane 2: Human hGH eluted from the Ni²⁺ NTA-agarose column after thecyclic folding procedure by the non-denaturing elution buffer(non-reduced sample).

Lane 3: Human hGH eluted from the Ni²⁺ NTA-agarose column after thecyclic folding procedure by the denaturing elution buffer B from thefolding procedure (non-reduced sample).

Lane 4-18: Fractions collected during the separation of monomerichGH-fusion protein from dimer and multimer fusion proteins after thecyclic folding procedure by ion exchange chromatography on Q-Sepharose(Pharmacia, Sweden). The monomeric protein was eluted in a peak wellseparated from the peak containing the dimer and multimer proteins(non-reduced samples).

FIG. 17: SDS-PAGE analysis of in vitro folding of recombinant kringle 1and 4 from human plasminogen and recombinant fusion protein #4 derivedfrom human α₂ -Macroglobulin Receptor Protein (α₂ MR).

Lane 1: Protein markers (Pharmacia, Sweden): From top of gel; 94 kDa, 67kDa, 43 kDa, 30 kDa, 20.1 kDa, and 14.4 kDa (reduced sample).

Lane 2: Crude K1-fusion protein extract before application to the Ni²⁺NTA-agarose column (reduced sample).

Lane 3: K1-fusion protein eluted from the Ni²⁺ NTA-agarose column afterthe cyclic folding procedure by the non-denaturing elution buffer(reduced sample).

Lane 4: Same as lane 3 (non-reduced sample).

Lane 5: Flow-through from the lysine-agarose column during applicationof the K1-fusion protein (non-reduced sample).

Lane 6: K1-fusion protein eluted from the lysine-agarose column(non-reduced sample).

Lane 7: K4-fusion protein eluted from the Ni²⁺ NTA-agarose column afterthe cyclic folding procedure by the non-denaturing elution buffer(reduced sample).

Lane 8: Same as lane 7 (non-reduced sample).

Lane 9: α₂ MR#4 fusion protein eluted from the Ni²⁺ NTA-agarose columnafter the cyclic folding procedure by the non-denaturing elution buffer(reduced sample).

Lane 10: Same as lane 9 (non-reduced sample).

FIG. 18: Construction of the expression plasmid pT₇ H₆ FX α₂ MRBDv.

The amplified DNA fragment containing the reading frame of human α₂-Macroglobulin from amino acid residues Val₁₂₉₉ to Ala₁₄₅₁, fused at the5'-end to the nucleotide sequence encoding the FX_(a) cleavage site IEGR(SEQ ID NO: 38), was cut with the restriction endonucleases Bam HI andHind III (purchased from Boehringer, Germany) and ligated with T₄ DNAligase (purchased from Boehringer, Germany) into Bam HI and Hind III cutpT₇ H₆ using standard procedures.

FIG. 19: Amino acid sequence of the receptor-binding domain of human α₂-Macroglobulin (from residue Val₁₂₉₉ to Ala₁₄₅₁) (SEQ ID NO: 55).

FIG. 20: Construction of the expression plasmid pT₇ H₆ FX-TETN.

The amplified DNA fragment containing the reading frame of maturemonomeric human Tetranectin from amino acid residues Glu₁ to Val₁₈₁,fused at the 5'-end to the nucleotide sequence encoding the FX_(a)cleavage site IEGR (SEQ ID NO: 38), was cut with the restrictionendonucleases Bam HI and Hind III (purchased from Boehringer, Germany)and ligated with T₄ DNA ligase (purchased from Boehringer, Germany) intoBam HI and Hind III cut pT₇ H₆ using standard procedures.

FIG. 21: Amino acid sequence of human monomeric Tetranectin.

The predicted amino acid sequence of the full length reading frameencoding human Tetranectin (SEQ ID NO: 56). The first Amino acid residuein the processed mature protein (Glu₁) is indicated.

FIG. 22: Construction of the expression plasmid pT₇ H₆ FX-DB32.

The amplified DNA fragment containing the reading frame of theartificial diabody DB32 from amino acid residues Gln₁ to Asn₂₄₆, fusedat the 5'-end to the nucleotide sequence encoding the FX_(a) cleavagesite IEGR (SEQ ID NO: 38), was cut with the restriction endonucleasesBam HI and Hind III (purchased from Boehringer, Germany) and ligatedwith T₄ DNA ligase (purchased from Boehringer, Germany) into Bam HI andHind III cut pT₇ H₆ using standard procedures.

FIG. 23: Amino acid sequence of the artificial diabody DB32 (SEQ ID NO:57).

FIG. 24: The expression plasmid pT₇ H₆ FX-PS.4.

The construction of pT₇ H₆ FX-PS.4 expressing human psoriasin from aminoacid residues Ser₂ to Gln₁₀₁ has previously been described (Hoffmann,1994).

FIG. 25: Amino acid sequence of human psoriasin.

The predicted amino acid sequence of the full length reading frameencoding human psoriasin (SEQ ID NO: 58).

FIGS. 26a and 26b: SDS-PAGE analysis of purification and FX_(a) cleavageof recombinant Mab 32 diabody.

FIG. 26a: Different stages of the purification

Lanes 1 and 2: Crude product from folding.

Lane 3: Final purified Mab 32 diabody fusion protein-product

Lane 4: Supernatant of crude folding product after 50-fold concentrationand centrifugation.

Lane 5: Pellet from crude folding product after 50-fold concentrationand centrifugation.

FIG. 26b: FX_(a) cleavage of Mab 32 diabody fusion protein.

Lanes 1 and 5: Final purified Mab 32 diabody fusion protein

Lane 2: Molar ratio 1:5 FX_(a) :Mab 32 diabody fusion protein at 37° C.for 20 hours.

Lane 3: Molar ratio 1:2 FX_(a) :Mab 32 diabody fusion protein at 37° C.for 20 hours

Lane 4: Molar ratio 1:1 FX_(a) :Mab 32 diabody fusion protein at 37° C.for 20 hours

FIG. 27: Suitability of glutathione as reducing agent in cyclicrefolding of human β₂ -microglobulin fusion protein.

Lane 1: Reduced sample of test no. 1.

Lane 2: Non-reduced sample of test no.1.

Lane 3: Non-reduced sample of test no.2.

Lane 4: Non-reduced sample of test no.3.

Lane 5: Non-reduced sample of test no.4.

Lane 6: Non-reduced sample of test no.5.

Lane 7: Non-reduced sample of test no.6.

Lane 8: Non-reduced sample of test no.7.

Lane 9: Non-reduced sample of test no.8.

Lane 10: Non-reduced sample of test no.9.

Lane 11: Non-reduced sample of test no.10.

Lane 12: Non-reduced sample of test no.11.

FIG. 28: Suitability of L-cysteine ethyl ester as reducing agent incyclic refolding of human β₂ -microglobulin fusion protein.

Lane 1: Reduced sample of test no. 1.

Lane 2: Non-reduced sample of test no.1.

Lane 3: Non-reduced sample of test no.2.

Lane 4: Non-reduced sample of test no.3.

Lane 5: Non-reduced sample of test no.4.

Lane 6: Non-reduced sample of test no.5.

Lane 7: Non-reduced sample of test no.6.

Lane 8: Non-reduced sample of test no.7.

Lane 9: Non-reduced sample of test no.8.

Lane 10: Non-reduced sample of test no.9.

FIG. 29: Suitability of 2-Mercaptoethanol as reducing agent in cyclicrefolding of human β₂ -microglobulin fusion protein.

Lane 1: Reduced sample of test no. 1.

Lane 2: Non-reduced sample of test no.1.

Lane 3: Non-reduced sample of test no.2.

Lane 4: Non-reduced sample of test no.3.

Lane 5: Non-reduced sample of test no.4.

Lane 6: Non-reduced sample of test no.5.

Lane 7: Non-reduced sample of test no.6.

Lane 8: Non-reduced sample of test no.7.

Lane 9: Non-reduced sample of test no.8.

Lane 10: Non-reduced sample of test no.9.

FIG. 30: Suitability of Mercaptosuccinic acid as reducing agent incyclic refolding of human β₂ -microglobulin fusion protein.

Lane 1: Non-reduced sample of test no.1.

Lane 2: Non-reduced sample of test no.2.

Lane 3: Non-reduced sample of test no.3.

Lane 4: Non-reduced sample of test no.4.

Lane 5: Non-reduced sample of test no.5.

Lane 6: Non-reduced sample of test no.6.

Lane 7: Non-reduced sample of test no.7.

Lane 8: Non-reduced sample of test no.8.

Lane 9: Non-reduced sample of test no.9.

FIG. 31: SDS-PAGE analysis of cyclic refolding of human β₂-microglobulin fusion protein.

Lane 1: Crude protein extract before application to the Ni²⁺ NTA-agarosecolumn (reduced sample).

Lane 2: 8 μl sample of soluble fraction of refolded hβ₂ m as describedin EXAMPLE 1.

Lane 3: 4 μl sample of soluble fraction of refolded hβ₂ m as describedin EXAMPLE 1.

Lane 4: 2 μl sample of soluble fraction of refolded hβ₂ m as describedin EXAMPLE 1.

Lane 5: 8 μl sample of soluble fraction of refolded hβ₂ m as describedin EXAMPLE 1.

Lanes 6 and 7: hβ₂ m final product after purification by ion exchangechromatography.

Lanes 8 and 9: Refolded hβ₂ m after optimized refolding protocol asdescribed in EXAMPLE 13.

FIG. 32: SDS-PAGE analysis of refolding of human β₂ -microglobulinfusion protein by buffer step and linear gradient.

Lane 1: Sample from soluble fraction of refolded hβ₂ m, folded by thebuffer step protocol as described in EXAMPLE 13.

Lane 2 and 3: Sample of insoluble fraction of refolded hβ₂ m, folded bythe buffer step protocol as described in EXAMPLE 13.

Lane 4: Protein molecular weight markers (Pharmacia, Sweden): From topof gel; 94 kDa, 67 kDa, 43 kDa, 30 kDa, 20.1 kDa, and 14.4 kDa (reducedsample).

Lane 5: Sample of soluble fraction of refolded hβ₂ m, folded by thelinear gradient protocol as described in EXAMPLE 13

Lane 6 and 7: Sample of insoluble fraction of refolded hβ₂ m, folded bythe linear gradient protocol as described in EXAMPLE 13.

FIG. 33: The general scheme of the design of the fusion proteinsdescribed in the examples.

In the N-terminal end of the fusion protein is optionally inserted a"booster segment" enhancing the level of expression of the fusionprotein in the cell expressing the DNA encoding the fusion protein.C-terminally to this, the "6H" indicates the 6 histidinyl residues whichconstitute an ion chelating site used as a "affinity handle" duringpurification and refolding of the fusion proteins. The "FX" at theC-terminal of the 6-histidinyl site is the FX_(a) cleavage site.Finally, the part of the fusion protein denoted "protein" represents theprotein which is going to be refolded according to the method of theinvention.

EXAMPLES

Examples 1 to 11 given in this section, which are used to exemplify the"cyclic folding procedure", all describe the process of folding arecombinant cleavable hybrid protein (fusion protein) produced in E.coli, purified from a crude protein extract and subjected to foldingwithout further purification by one general procedure.

The nucleotide sequence encoding the recombinant protein, which is to beproduced, is at the 5'-end fused to a nucleotide sequence encoding anamino acid sequence specifying a FX_(a) cleavage site (FX), in turnlinked N-terminally to a segment containing six histidinyl residues (SEQID NO: 47). The linking of the FX_(a) cleavage site is normally achievedduring a Polymerase Chain Reaction, wherein the 5'-terminal primercomprises nucleotides encoding this sequence. The linking of the sixhistidinyl residues is normally obtained by employing a vector whichcomprises a nucleotide fragment encoding SEQ ID NO: 47. The sixhistidinyl residues constitute a metal ion chelating site, which isutilized as affinity handle during purification of the fusion proteinand subsequently as the point of contact to the solid matrix during thecyclic folding process. Occasionally `booster segments` (e.g. a segmentderived from the N-terminus of the λcII protein in some cases followedby a segment derived from myosin light chain) are inserted N-terminal tothe affinity handle in order to improve the level of expression of thefusion protein in E. coli.

The fusion proteins are all designed according to the same generalscheme (cf. FIG. 34). The presence of booster segments, affinity handleand FX_(a) cleavage site might complicate refolding of the recombinantprotein of interest. Furthermore, the cyclic folding process isinitiated immediately after the affinity purification of the fusionprotein. This means that fusion protein material, which has beenpartially degraded by the E. coli host, is retained on the affinitymatrix in addition to the full length fusion protein column. Thisdegraded fusion protein may well interfere severely with refolding ofthe full-length fusion protein, thereby reducing the apparent efficiencyof the process. The folding efficiency results reported in Examples 1 to11 therefore cannot directly be compared to the efficiency of theprocess of refolding a purified fusion protein.

Examples 1 to 11 describe the refolding procedure for 21 differentproteins, protein domains or domain-clusters, ranging from a size of 82amino acids (K1, Example 6) to 780 amino acids (α₂ MR#7, Example 4), andthe number of disulphide bridges in the proteins ranges from zero (α₂MRAP, Example 3) to 33 (α₂ MR#4, Example 4) and 36 (α₂ MR#7, Example 4).

The efficiency of the refolding of the proteins ranges from 15 to 95%,and the yield of active protein lies in the order of 10-100 mg forrefolding on a 40 ml Ni+NTA-agarose column (NTA denotes a substitutednitrilotriacetic acid).

The following tables 1-5 demonstrate the gradient profiles used in theexamples. "Time" is given in minutes and "flow" in ml/min.

                  TABLE 1    ______________________________________    Step      Time   Flow        % A  % B    ______________________________________    1         0      2           100  0    2         45     2           100  0    3         46     2           0    100    4         52     2           0    100    5         60     2           100  0    6         105    2           100  0    7         106    2           4    96    8         113    2           4    96    9         120    2           100  0    10        165    2           100  0    11        166    2           8    92    12        172    2           8    92    13        180    2           100  0    14        225    2           100  0    15        226    2           12   88    16        232    2           12   88    17        240    2           100  0    18        285    2           100  0    19        286    2           16   84    20        202    2           16   84    21        300    2           100  0    22        345    2           100  0    23        346    2           20   80    24        352    2           20   80    25        360    2           100  0    26        405    2           100  0    27        406    2           24   76    28        412    2           24   76    29        420    2           100  0    30        465    2           100  0    31        466    2           28   72    32        472    2           28   72    33        480    2           100  0    34        525    2           100  0    35        526    2           32   60    36        332    2           32   68    37        540    2           100  0    38        585    2           100  0    39        586    2           36   64    40        592    2           36   64    41        600    2           100  0    42        645    2           100  0    43        646    2           40   60    44        652    2           40   60    45        660    2           100  0    46        705    2           100  0    47        706    2           44   56    48        713    2           44   56    49        720    2           100  0    50        765    2           100  0    51        766    2           48   52    52        772    2           48   52    53        780    2           100  0    54        825    2           100  0    55        826    2           52   48    56        832    2           52   48    57        840    2           100  0    58        005    2           100  0    59        886    2           56   44    60        892    2           56   44    61        900    2           100  0    62        945    2           100  0    63        948    2           60   40    64        952    2           60   40    65        960    2           100  0    66        1005   2           100  0    67        1006   2           62   38    68        1012   2           62   38    69        1020   2           100  0    70        1065   2           100  0    71        1066   2           64   36    72        1072   2           64   36    73        1080   2           100  0    74        1125   2           100  0    75        1126   2           66   34    76        1132   2           66   34    77        1140   2           100  0    78        1185   2           100  0    79        1188   2           68   32    80        1192   2           68   32    81        1200   2           100  0    82        1245   2           100  0    83        1246   2           70   30    84        1252   2           70   30    85        1260   2           100  0    86        1305   2           100  0    87        1306   2           72   28    88        1312   2           72   28    89        1319   2           100  0    90        1364   2           100  0    91        1365   2           74   26    92        1371   2           74   26    93        1378   2           100  0    94        1423   2           100  0    95        1424   2           76   24    96        1430   2           76   24    97        1437   2           100  0    98        1482   2           100  0    99        1483   2           78   22    100       1489   2           78   22    101       1456   2           100  0    102       1541   2           100  0    103       1542   2           80   20    104       1540   2           80   20    105       1555   2           100  0    106       1556   2           82   18    107       1562   2           82   18    108       1569   2           100  0    109       1614   2           100  0    110       1615   2           84   16    111       1621   2           84   16    112       1628   2           100  0    113       1673   2           100  0    114       1674   2           88   12    115       1732   2           88   12    116       1733   2           100  0    117       1778   2           100  0    ______________________________________

                  TABLE 2    ______________________________________    Step      Time   Flow        % A  % B    ______________________________________    1         0      2           100  0    2         45     2           100  0    3         46     2           0    100    4         52     2           0    100    5         60     2           100  0    6         105    2           100  0    7         106    2           8    92    8         113    2           8    92    9         120    2           100  0    10        165    2           100  0    11        166    2           20   80    12        172    2           20   80    13        180    2           100  0    14        225    2           100  0    15        228    2           28   72    16        232    2           28   72    17        240    2           100  0    18        285    2           100  0    19        286    2           34   66    20        292    2           34   66    21        300    2           100  0    22        345    2           100  0    23        346    2           42   58    24        252    2           42   58    25        360    2           100  0    26        405    2           100  0    27        406    2           50   50    28        412    2           50   50    29        420    2           100  0    30        465    2           100  0    31        466    2           54   46    32        472    2           54   46    33        480    2           100  0    34        525    2           100  0    35        526    2           58   42    36        532    2           58   42    37        540    2           100  0    38        585    2           100  0    39        586    2           62   38    40        502    2           62   38    41        600    2           100  0    42        645    2           100  0    43        646    2           66   34    44        632    2           66   34    45        660    2           100  0    46        705    2           100  0    47        706    2           70   30    48        713    2           70   30    49        720    2           100  0    50        765    2           100  0    51        766    2           74   26    52        772    2           74   26    53        780    2           100  0    54        825    2           100  0    55        826    2           76   24    56        832    2           76   24    57        840    2           100  0    58        885    2           100  0    59        886    2           78   22    60        892    2           78   22    61        900    2           100  0    62        945    2           100  0    63        946    2           80   20    64        952    2           80   20    65        960    2           100  0    66        1005   2           100  0    67        1006   2           82   18    68        1012   2           82   18    69        1020   2           100  0    70        1065   2           100  0    71        1066   2           84   16    72        1072   2           84   16    73        1080   2           100  0    74        1125   2           100  0    75        1126   2           86   14    76        1132   2           86   14    77        1140   2           100  0    78        1185   2           100  0    79        1186   2           88   12    80        1192   2           88   12    81        1200   2           100  0    82        1245   2           100  0    83        1246   2           90   10    84        1252   2           90   10    85        1260   2           100  0    86        1305   2           100  0    87        1306   2           95   5    88        1312   2           95   5    89        1319   2           100  0    90        1364   2           100  0    ______________________________________

                  TABLE 3    ______________________________________    Step      Time   Flow        % A  % B    ______________________________________    1         0.0    1.0         0.0  100.0    2         10.0   1.0         0.0  100.0    3         40.0   1.0         100.0                                      0.0    4         70.0   1.0         100.0                                      0.0    5         70.5   1.0         10.0 90.0    6         80.0   1.0         10.0 90.0    7         110.0  1.0         100.0                                      0.0    8         140.0  1.0         100.0                                      0.0    9         140.5  1.0         20.0 80.0    10        150.0  1.0         20.0 80.0    11        100.0  1.0         100.0                                      0.0    12        210.0  1.0         100.0                                      0.0    13        210.5  1.0         30.0 70.0    14        220.0  1.0         90.0 70.0    15        250.0  1.0         100.0                                      0.0    16        280.0  1.0         100.0                                      0.0    17        280.5  1.0         40.0 60.0    18        290.0  1.0         40.0 60.0    19        320.0  1.0         100.0                                      0.0    20        350.0  1.0         100.0                                      0.0    21        350.5  1.0         50.0 50.0    22        360.0  1.0         50.0 50.0    23        390.0  1.0         100.0                                      0.0    24        420.0  1.0         100.0                                      0.0    25.0      420.5  1.0         60.0 40.0    26.0      430.0  1.0         60.0 40.0    27.0      460.0  1.0         100.0                                      0.0    28.0      400.0  1.0         100.0                                      0.0    29.0      490.5  1.0         70.0 30.0    30.0      500.0  1.0         70.0 30.0    31.0      530.0  1.0         100.0                                      0.0    32.0      560.0  1.0         100.0                                      0.0    33.0      560.5  1.0         80.0 20.0    34.0      570.0  1.0         80.0 20.0    35.0      600.0  1.0         100.0                                      0.0    36.0      630.0  1.0         100.0                                      0.0    37.0      630.5  1.0         85.0 15.0    38.0      640.0  1.0         85.0 15.0    39.0      670.0  1.0         100.0                                      0.0    40.0      700.0  1.0         100.0                                      0.0    41.0      700.5  1.0         88.0 12.0    42.0      710.0  1.0         88.0 12.0    43.0      740.0  1.0         100.0                                      0.0    44.0      770.0  1.0         100.0                                      0.0    45.0      770.5  1.0         90.0 10.0    46.0      780.0  1.0         90.0 10.0    47.0      810.0  1.0         100.0                                      0.0    48.0      850.0  1.0         100.0                                      0.0    ______________________________________

                  TABLE 4    ______________________________________    Step      Time   Flow        % A  % B    ______________________________________    1         0      2           100  0    2         45     2           100  0    3         46     2           0    100    4         52     2           0    100    5         60     2           100  0    6         105    2           100  0    7         106    2           4    96    8         113    2           4    96    9         120    2           100  0    10        165    2           100  0    11        166    2           8    92    12        172    2           8    92    13        180    2           100  0    14        225    2           100  0    15        226    2           12   88    16        232    2           12   88    17        240    2           100  0    18        285    2           100  0    19        288    2           16   84    20        292    2           16   84    21        300    2           100  0    22        343    2           100  0    23        346    2           20   80    24        352    2           20   80    25        360    2           100  0    26        405    2           100  0    27        406    2           24   76    28        412    2           24   76    29        420    2           100  0    30        465    2           100  0    31        466    2           28   72    32        472    2           28   72    33        480    2           100  0    34        525    2           100  0    35        526    2           32   68    36        532    2           32   68    37        540    2           100  0    38        585    2           100  0    39        586    2           36   64    40        592    2           36   64    41        600    2           100  0    42        645    2           100  0    43        646    2           40   60    44        652    2           40   60    45        660    2           100  0    46        705    2           100  0    47        705    2           44   56    48        719    2           44   56    49        720    2           100  0    50        765    2           100  0    51        766    2           48   52    52        772    2           40   52    53        780    2           100  0    54        825    2           100  0    55        826    2           52   48    56        832    2           52   48    57        840    2           100  0    58        885    2           100  0    59        886    2           56   44    60        892    2           56   44    61        900    2           100  0    62        945    2           100  0    63        946    2           60   40    64        952    2           60   40    65        960    2           100  0    66        1005   2           100  0    67        1006   2           64   36    68        1012   2           64   36    69        1020   2           100  0    70        1065   2           100  0    71        1066   2           68   32    72        1072   2           68   32    73        1080   2           100  0    74        1125   2           100  0    75        1126   2           70   30    76        1132   2           70   30    77        1140   2           100  0    78        1185   2           100  0    79        1186   2           72   28    80        1192   2           72   28    81        1200   2           100  0    82        1245   2           100  0    83        1246   2           75   25    84        1252   2           75   25    85        1260   2           100  0    86        1305   2           100  0    87        1306   2           80   20    88        1312   2           80   20    89        1319   2           100  0    90        1364   2           100  0    91        1365   2           85   15    92        1371   2           85   15    93        1378   2           100  0    94        1423   2           100  0    ______________________________________

                  TABLE 5    ______________________________________    Step      Time   Flow        % A  % B    ______________________________________    1         0      2           100  0    2         45     2           100  0    3         46     2           0    100    4         52     2           0    100    5         60     2           100  0    6         105    2           100  0    7         106    2           13   87    8         113    2           13   87    9         120    2           100  0    10        165    2           100  0    11        166    2           25   75    12        172    2           25   75    13        180    2           100  0    14        225    2           100  0    15        226    2           29   71    16        232    2           29   71    17        240    2           100  0    18        285    2           100  0    19        286    2           34   66    20        292    2           34   66    21        300    2           100  0    22        345    2           100  0    23        346    2           38   62    24        352    2           38   62    25        360    2           100  0    26        405    2           100  0    27        406    2           40   60    28        412    2           40   60    29        420    2           100  0    30        465    2           100  0    31        466    2           42   58    32        472    2           42   58    33        480    2           100  0    34        525    2           100  0    35        526    2           44   56    36        532    2           44   56    37        540    2           100  0    38        585    2           100  0    39        586    2           46   54    40        592    2           48   54    41        600    2           100  0    42        645    2           100  0    43        646    2           48   52    44        652    2           48   52    45        660    2           100  0    46        705    2           100  0    47        706    2           50   50    48        713    2           50   50    49        720    2           100  0    50        765    2           100  0    51        766    2           52   48    52        772    2           52   48    53        780    2           100  0    54        825    2           100  0    55        826    2           54   46    56        832    2           54   46    57        840    2           100  0    58        885    2           100  0    59        886    2           56   44    60        892    2           56   44    61        900    2           100  0    62        945    2           100  0    63        946    2           58   42    64        952    2           58   42    65        960    2           100  0    66        1005   2           100  0    67        1006   2           60   40    68        1012   2           60   40    69        1020   2           100  0    70        1065   2           100  0    71        1066   2           62   38    72        1072   2           62   38    73        1080   2           100  0    74        1125   2           100  0    75        1126   2           66   34    76        1132   2           66   34    77        1140   2           100  0    78        1185   2           100  0    79        1186   2           70   30    80        1102   2           70   30    81        1200   2           100  0    82        1245   2           100  0    83        1246   2           74   26    84        1232   2           74   26    85        1260   2           100  0    86        1305   2           100  0    87        1306   2           78   22    88        1312   2           78   22    89        1319   2           100  0    90        1364   2           100  0    91        1365   2           82   18    92        1371   2           82   18    93        1378   2           100  0    94        1423   2           100  0    ______________________________________

Example 1

Production and Folding of Human and Murine β₂ -microglobulin

This example describes the production in E. coli of both human β₂-microglobulin and murine β₂ -microglobulin as FX_(a) cleavable fusionproteins, and the purification of the recombinant human and murine β₂-microglobulin after FX_(a) cleavage.

Plasmid clones containing the full length cDNAs encoding the human andthe murine β₂ -microglobulin proteins (generously provided by Dr. DavidN. Garboczi to Dr. S.o slashed.ren Buus) were used as templates in aPolymerase Chain Reaction (PCR) (Saiki et al., 1988) designed to producecDNA fragments corresponding to the mature human (corresponding to aminoacid residue Ile₁ to Met₉₉) and the mature murine (corresponding toamino acid residue Ile₁ to Met₉₉) β₂ -microglobulin proteins, by use ofthe primers SEQ ID NO: 3 and SEQ ID NO: 4 (for the human β₂-microglobulin) and SEQ ID NO: 5 and SEQ ID NO: 6 (for the murine β₂-microglobulin). The amplified coding reading frames were at their5'-ends, via the PCR-reaction, linked to nucleotide sequences, includedin SEQ ID NO: 3 and 5, encoding the amino acid sequence SEQ ID NO: 37,which constitute a cleavage site for the bovine restriction proteaseFX_(a) (Nagai and Th.o slashed.gersen, 1987). The amplified DNAfragments were subcloned into the E. coli expression vector pT₇ H₆(Christensen et al., 1991). The construction of the resulting plasmidspT₇ H₆ FX-hβ₂ m (expressing human β₂ -microglobulin) and pT₇ H₆ FX-mβ₂ m(expressing murine β₂ -microglobulin) is outlined in FIG. 2 and in FIG.3 is shown the amino acid sequences of the expressed proteins (SEQ IDNO: 49 (human) and SEQ ID NO: 50 (murine)).

Human and murine β₂ microglobulin were produced by growing andexpressing the plasmids pT₇ H₆ FX-hβ₂ m and -mβ₂ m in E. coli BL21 cellsin a medium scale (2×1 liter) as described by Studier and Moffat, J.Mol. Biol., 189: 113-130, 1986. Exponentially growing cultures at 37° C.were at OD₆₀₀ 0.8 infected with bacteriophage λCE6 at a multiplicity ofapproximately 5. Cultures were grown at 37° C. for another three hoursbefore cells were harvested by centrifugation. Cells were lysed byosmotic shock and sonification and total cellular protein extracted intophenol (adjusted to pH 8 with Trisma base). Protein was precipitatedfrom the phenol phase by addition of 2.5 volumes of ethanol andcentrifugation. The protein pellet was dissolved in a buffer containing6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and 0.1 M dithioerythriol.Following gel filtration on Sephadex G-25 (Pharmacia, LKB, Sweden) into8 M Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, 10 mM 2-mercaptoethanol and 3mM methionine the crude protein preparation was applied to Ni²⁺activated NTA-agarose columns for purification (Hochuli et al., 1988.)of the fusion proteins, MGSHHHHHHGSIEGR-human and murine β₂-microglobulin (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) respectivelyand subsequently to undergo the cyclic folding procedure.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

Ni²⁺ activated NTA-agarose matrix (Ni²⁺ NTA-agarose) is commerciallyavailable from Diagen GmbH, Germany. During the course of this work itwas found, however, that this commercial product did not perform as wellas expected. Our observations were, that the commercial Ni²⁺ NTA-agarosematrix was easily blocked when applying the denatured and reduced totalprotein extract, that the capacity for fusion protein was lower thanexpected, and that the matrix could only be regenerated successfully afew times over.

In order to improve the performance of the Ni²⁺ NTA-agarose it wasdecided to perform a carbodiimide coupling of theN-(5-amino-1-carboxypentyl)iminodiacetic acid metal ligand (synthesisroute as described by Dobeli & Hochuli (EPO 0253 303)) to a more rigidagarose matrix (i.e. Sepharose CL-6B, Pharmacia, Sweden):

8 g. of N-(5-amino-1-carboxypentyl)iminodiacetic acid from the synthesisprocedure in 50 ml was adjusted to pH 10 by addition of 29 g. of Na₂ CO₃(10 H₂ O) and added to a stirred suspension of activated Sepharose CL-6Bin 1 M Na₂ CO₃. Reaction was allowed overnight.

The Sepharose CL-6B (initially 100 ml. suspension) was activated afterremoval of water by acetone with 7 g. of 1,1'-carbonyldiimidazol understirring for 15 to 30 min. Upon activation the Sepharose CL-6B waswashed with acetone followed by water and 1 M Na₂ CO₃. The NTA-agarosematrix was loaded into a column and "charged" with Ni²⁺ by slowlypassing through 5 column volumes of a 10% NiSO₄ solution. The amount ofNi²⁺ on the NTA-agarose matrix, prepared by this procedure, has beendetermined to 14 μmoles per ml matrix. The Ni²⁺ NTA-agarose matrix waspacked in a standard class column for liquid chromatography (internaldiameter: 2.6·cm) to a volume of 40 ml. After charging the Ni²⁺NTA-agarose column was washed with two column volumes of water, onecolumn volume of 1 M Tris-HCl pH 8 and two column volumes of loadingbuffer before application of the crude protein extract.

Upon application of the crude protein extracts on the Ni²⁺ NTA-agarosecolumn, the fusion proteins, MGSHHHHHHGSIEGR-hβ₂ m andMGSHHHHHHGSIEGR-mβ₂ m (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48)respectively, were purified from the majority of coli and λ phageproteins by washing with one column volume of the loading bufferfollowed by 6 M guanidinium chloride, 50 mM Tris-HCl, 10 mM2-mercaptoethanol, and 3 mM methionine until the optical density (OD) at280 nm of the column eluates were stable.

The fusion proteins were refolded on the Ni²⁺ NTA-agarose column using agradient manager profile as described in table 1 and 0.5 M NaCl, 50 mMTris-HCl pH 8, and 1.2 mM/0.4 mM reduced/oxidized glutathione as bufferA and 8 M urea, 0.5 M NaCl, 50 mM Tris-HCl pH 8, 3 mM methionine, and 6mM reduced glutathione as buffer B. The reduced/oxidized glutathionesolution was freshly prepared as a 200 times stock solution by additionof 9.9 M H₂ O₂ to a stirred solution of 0.2 M reduced glutathione beforeaddition to buffer A.

After completion of the cyclic folding procedure the hβ₂ m and mβ₂ mfusion proteins were eluted from the Ni²⁺ NTA-agarose columns with abuffer containing 0.5 M NaCl, 50 mM Tris-HCl, 20 mM EDTA pH 8.

Fusion protein that were aggregated and precipitated on the Ni²⁺NTA-agarose columns were eluted in buffer B. Approximately 75% of thefusion protein material was eluted by non-denaturing elution buffer (seeFIG. 16, lanes 2 and 3).

As judged by non-reducing SDS-PAGE analysis approximately 70% of thesoluble hβ₂ m fusion protein material (corresponding to 40 mg of hβ₂ mfusion protein) appeared monomeric (see FIG. 15, lanes 5 and 3) whereas25% of the mβ₂ m fusion protein appeared monomeric (corresponding to 20mg of mβ₂ m fusion protein). The overall efficiency of the foldingprocedure are therefore approximately 50% for the hβ₂ m fusion proteinand less than 20% for the mβ₂ m fusion protein.

Monomeric hβ₂ m and mβ₂ m fusion proteins were purified from dimer andhigher order multimers by ion exchange chromatography on S-Sepharose(Pharmacia, Sweden): The fusion proteins eluted by the non denaturingelution buffer (approximately 70% of the fusion protein material) wasgelfiltrated into a buffer containing 5 mM NaCl and 5 mM Tris-HCl pH 8on Sephadex G-25 and diluted 1:1 with water before applied onto theS-Sepharose ion exchange columns. Fusion proteins were eluted over 5column volumes with a liner gradient from 2.5 mM NaCl, 2.5 mM Tris-HclpH 8 to 100 mM NaCl, 25 mM Tris-Hcl pH 8. The monomeric hβ₂ m as well asmβ₂ m fusion proteins eluted in the very beginning of the gradient,whereas dimers and higher order multimers eluted later. Fractionscontaining the monomeric fusion proteins were diluted with water andreloaded onto the S-Sepharose columns and one-step eluted in 1 M NaCl,50 mM Tris-HCl pH 8.

The monomeric fusion proteins were cleaved with the restriction proteaseFX_(a) overnight at room temperature in a weight to weight ratio ofapproximately 200 to one.

After cleavage the recombinant hβ₂ m and mβ₂ m proteins were purifiedfrom the N terminal fusion tail, liberated from the cleaved fusionprotein and FX_(a) by ion exchange chromatography on Q-Sepharose columns(Pharmacia, Sweden): Upon gelfiltration on Sephadex G-25 into 5 mM NaCl,5 mM Tris-HCl pH 8 and 1:1 dilution with water, recombinant hβ₂ m andmβ₂ m were eluted in a linear gradient (over 5 column volumes) from 2.5mM NaCl, 2.5 mM Tris-HCl pH 8 to 100 mM NaCl, 25 mM Tris-HCl pH 8.Fractions containing the cleaved recombinant proteins were diluted withwater and reloaded to the Q-Sepharose columns and one-step eluted in 1 MNaCl, 50 mM Tris-HCl pH 8. Recombinant hβ₂ m and mβ₂ m proteins weregelfiltrated into freshly prepared 20 mM NH₄ HCO₃ and lyophilized twice.

SDS-PAGE analysis of the production of recombinant human β₂-microglobulin is presented in FIG. 15.

The yield of fully processed recombinant human β₂ -microglobulinproduced by this procedure was 30 mg.

The yield of fully processed recombinant murine β₂ -microglobulinproduced by this procedure was 10 mg.

Comparison of recombinant human with purified natural human β₂-microglobulin β₂ -microglobulin was kindly carried out by Dr. S.oslashed.ren Buus in two different assays:

1. It was found that Recombinant human β₂ -microglobulin and naturalhuman β₂ -microglobulin reacted with both a monoclonal- and amonospecific antibody with identical affinity.

2. Recombinant human β₂ -microglobulin and natural human β₂-microglobulin were in an binding inhibition experiment usingradiolabelled ligands found to bind natural affinity purified heavychain class I K^(d) molecules with an identical affinity.

Recombinant murine β₂ -microglobulin was found to bind natural class Iheavy chain molecules with an affinity 5 times lower than the human β₂-microglobulin. This result is in good agreement with previous resultsfrom the literature using natural material.

Example 3

Production and folding of Human Growth Hormone (Somatotropin)

This example describes the production in E. coli of human growth hormone(hGH) as a FX_(a) cleavable fusion protein, and the purification of therecombinant hGH after FX_(a) cleavage.

A plasmid clone containing the cDNA encoding the hGH (generouslyprovided by Dr. Henrik Dalb.o slashed.ge (Dalb.o slashed.ge et al.,1987) were used as template in a Polymerase Chain Reaction (PCR) (Saikiet al., 1988), using the primers SEQ ID NO: 7 and SEQ ID NO: 8, designedto produce a cDNA fragment corresponding to the mature hGH(corresponding to amino acid residue Phe₁ to Phe₁₉₁) protein. Theamplified coding reading frame was at the 5'-end, via the PCR-reaction,linked to a nucleotide sequence, included in SEQ ID NO: 7, encoding theamino acid sequence SEQ ID NO: 37 which constitute a cleavage site forthe bovine restriction protease FX_(a) (Nagai and Th.o slashed.gersen,1987). The amplified DNA fragment was subcloned into the E. coliexpression vector pT₇ H₆ (Christensen et al., 1991). The construction ofthe resulting plasmid pT₇ H₆ FX-hGH (expressing human Growth Hormone) isoutlined in FIG. 4 and in FIG. 5 is shown the amino acid sequence of theexpressed protein (SEQ ID NO: 51).

Recombinant human Growth Hormone was produced by growing and expressingthe plasmid pT₇ H₆ FX-hGH in E. coli BL21 cells in a medium scale (2×1liter) as described by Studier and Moffat, J. Mol. Biol., 189: 113-130,1986. Exponentially growing cultures at 37° C. were at OD₆₀₀ 0.8infected with bacteriophage λCE6 at a multiplicity of approximately 5.Cultures were grown at 37° C. for another three hours before cells wereharvested by centrifugation. Cells were lysed by osmotic shock andsonification and total cellular protein extracted into phenol (adjustedto pH 8 with Trisma base). Protein was precipitated from the phenolphase by addition of 2.5 volumes of ethanol and centrifugation. Theprotein pellet was dissolved in a buffer containing 6 M guanidiniumchloride, 50 mM Tris-HCl pH 8 and 50 mM dithioerythriol. Following gelfiltration on Sephadex G-25 (Pharmacia, LKB, Sweden) into 8 M Urea, 1 MNaCl, 50 mM Tris HCl pH 8, 5 mM 2-mercaptoethanol and 1 mM methioninethe crude protein preparation was applied to a Ni²⁺ activatedNTA-agarose column (Ni²⁺ NTA-agarose) for purification (Hochuli et al.,1988) of the fusion protein, MGSHHHHHHGSIEGR-hGH (whereinMGSHHHHHHGSIEGR is SEQ ID NO: 48) and subsequently to undergo the cyclicfolding procedure.

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder Example 1.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

Upon application of the crude protein extract on the Ni²⁺ NTA-agarosecolumn, the fusion protein, MGSHHHHHHGSIEGR-hGH (wherein MGSHHHHHHGSIEGRis SEQ ID NO: 48) was purified from the majority of E. coli and λ phageproteins by washing with one column volume of the loading bufferfollowed by 6 M guanidinium chloride, 50 mM Tris-HCl, 5 mM 2mercaptoethanol, and 1 mM methionine until the optical density (OD) at280 nm of the eluate was stable.

The fusion protein was refolded on the Ni²⁺ NTA-agarose column using agradient manager profile as described in table 2 and 0.5 M NaCl, 50 mMTris-HCl pH 8, and 1.0 mM/0.1 mM reduced/oxidized glutathione as bufferA and 8 M urea, 0.5 M NaCl, 50 mM Tris-HCl pH 8, 1 mM methionine, and 5mM reduced glutathione as buffer B. The reduced/oxidized glutathionesolution was freshly prepared as a 200 times stock solution by additionof 9.9 M H₂ O₂ to a stirred solution of 0.2 M reduced glutathione beforeaddition to buffer A.

After completion of the cyclic folding procedure the hGH fusion proteinwas eluted from the Ni²⁺ NTA-agarose column with a buffer containing 0.5M NaCl, 50 mM Tris-HCl, 20 mM EDTA pH 8. Fusion protein that wasaggregated and precipitated on the Ni²⁺ NTA-agarose column was eluted inbuffer B.

Approximately 80% of the fusion protein material was eluted by thenon-denaturing elution buffer (see FIG. 16, lanes 2 and 3). As judged bynon-reducing SDS PAGE analysis 90% of the soluble fusion proteinmaterial (corresponding to approximately 70 mg of fusion protein)appeared monomeric (see FIG. 16, lane 2) yielding an overall efficiencyof the folding procedure of approximately 70%.

Monomeric hGH fusion protein was purified from dimer and higher ordermultimers by ion exchange chromatography on Q-Sepharose (Pharmacia,Sweden): After gelfiltration into a buffer containing 25 mM NaCl and 25mM Tris-HCl pH 8 on Sephadex G-25 the fusion protein material, eluted bythe non-denaturing buffer, was applied onto a Q-Sepharose ion exchangecolumn. Fusion protein was eluted over 5 column volumes with a lineargradient from 25 mM NaCl, 25 mM Tris-HCl pH 8 to 200 mM NaCl, 50 mMTris-HCl pH 8. The monomeric hGH fusion protein eluted in the beginningof the gradient, whereas dimers and higher order multimers eluted latex.Fractions containing the pure monomeric fusion protein was added NiSO₄and iminodiacetic acid (IDA, adjusted pH 8 with NaOH) to 1 mM andcleaved with the restriction protease FX_(a) for 5 hours at 37° C. in aweight to weight ratio of approximately 100 to one. FX_(a) was inhibitedafter cleavage by addition of Benzamidine hydrochloride to 1 mM.

After cleavage the recombinant hGH protein was isolated from uncleavedfusion protein and the liberated fusion tail, upon gelfiltration onSephadex G-25 into 8 M Urea, 50 mM Tris-HCl pH 8, to remove Ni²⁺ IDA andBenzamidine, by passage through a small Ni²⁺ NTA-agarose column followedinline by a small Nd³⁺ NTA agarose column and subsequently a non Ni²⁺activated NTA-agarose column to ensure complete removal of FX_(a) and ofNi²⁺ and Nd³⁺, respectively. Recombinant hGH was purified from a minorfraction of recombinant breakdown product by ion exchange chromatographyon Q-Sepharose: hGH was eluted in a linear gradient (over 5 columnvolumes) from 8 M Urea, 50 mM Tris HCl pH 8 to 8 M Urea, 250 mM NaCl, 25mM Tris-HCl pH 8. Fractions containing the cleaved purified recombinantprotein was gelfiltrated into freshly prepared 20 mM NH₄ HCO₃ andlyophilized twice.

SDS-PAGE analysis of the production and folding of recombinant humangrowth hormone is presented in FIG. 16.

The yield of fully processed recombinant human growth hormone producedby this procedure was 10 mg.

The recombinant human growth hormone produced by this procedureco-migrated both in reducing and non-reducing SDS-PAGE and innon-denaturing PAGE analysis with biologically active recombinant humangrowth hormone generously provided by Novo-Nordisk A/S.

Example 3

Production and folding of human α₂ MRAP

The plasmid used for expression in E. coli BL21 cells of the human α₂-Macroglobulin Receptor Associated Protein (α₂ MRAP), pT7H6FX-α₂ MRAPand the conditions used for production of the fusion protein haspreviously been described by us in Nykjar et al., J. Biol. Chem. 267:14543-14546, 1992. The primers SEQ ID NO: 9 and SEQ ID NO: 10 were usedin the PCR employed for multiplying the α₂ MRAP encoding DNA.

Crude protein extract precipitated from the phenol phase of the proteinextraction of cells from 2 liters of culture of MGSHHHHHHCSIEGR-α₂ MRAP(wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) expressing E. coli BL21 cellswas dissolved in a buffer containing 6 M guanidinium chloride, 50 mMTris-HCl pH 8 and 50 mM dithioerythriol. Following gel filtration onSephadex G-25 (Pharmacia, Sweden) into 8 M Urea, 0.5 M NaCl, 50 mMTris-HCl pH 8, and 1 mM methionine the crude protein preparation wasapplied to a Ni²⁺ activated NTA-agarose matrix (Ni²⁺ NTA-agarose) forpurification (Hochuli et al., 1988) of the fusion protein,MGSHHHHHHGSIEGR-α₂ MRAP (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) andsubsequently to undergo the cyclic folding process.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder Example 1.

Upon application of the crude protein extract on the Ni²⁺ NTA-agarosecolumn, the fusion protein, MGSHHHHHHGSIEGR α₂ MRAP (whereinMGSHHHHHHGSIEGR is SEQ ID NO: 48) was purified from the majority of E.coli and λ phage proteins by washing with one column volume of theloading buffer followed by 6 M guanidinium chloride, 50 mM Tris-HCl, and1 mM methionine until the optical density (OD) at 280 nm of the eluatewas stable.

The fusion protein was refolded on the Ni²⁺ NTA-agarose column using agradient manager profile as described in table 3 and 0.5 M NaCl, 50 mMTris-HCl pH 8, 2 mM CaCl₂ and 1 mM 2-mercaptoethanol as buffer A and 6 Mguanidinium chloride, 50 mM Tris-HCl pH 8, 2 mM CaCl₂ and 1 mM2-mercaptoethanol as buffer B.

After completion of the cyclic folding procedure the α₂ MRAP fusionprotein was eluted from the Ni²⁺ NTA agarose column with a buffercontaining 0.5 M NaCl, 50 mM Tris-HCl, 20 mM EDTA pH 8.

Virtually no fusion protein was found to be aggregated or precipitatedon the Ni²⁺ NTA-agarose column. The estimated yield of α₂ MRAP fusionprotein was 60 mg and the efficiency of the folding procedure was closeto 95%.

The fusion protein MGSHHHHHHGSIEGR-α₂ MRAP (wherein MGSHHHHHHGSIEGR isSEQ ID NO: 48) was cleaved with the bovine restriction protease FX_(a)overnight at room temperature in a weight to weight ratio of 200:1 inthe elution buffer. Upon gelfiltration on Sephadex G-25 into 100 mMNaCl, 25 mM Tris-HCl pH 8, the protein solution was passed through aNi²⁺ NTA agarose column thereby removing uncleaved fusion protein andthe liberated fusion N-terminal tail originating from cleaved fusionproteins. Finally the protein solution was diluted 1:4 with water andthe α₂ MRAP protein purified from FX_(a) by ion exchange chromatographyon Q Sepharose (Pharmacia, Sweden). The Q-Sepharose column was elutedwith a linear gradient over 6 column volumes from 25 mM NaCl, 25 mMTris-HCl pH 8 to 250 mM NaCl, 25 mM Tris-HCl pH 8. The α₂ MRAP proteineluted in the very beginning of the linear gradient whereas FX_(a)eluted later.

The yield of α₂ MRAP protein produced and refolded by this procedure was40 mg.

The ligand binding characteristics (i.e. binding to the α₂-Macroglobulin Receptor and interference with the binding of humanUrokinase Plasminogen Activator--Plasminogen Activator inhibitor type-Icomplex to the α_(a) -M Receptor) has, according to Dr. Nykjar, beenfound identical to the ligand binding characteristics of the purifiednatural protein.

Example 4

Production and folding of domains and domain-clusters from the α₂ -MReceptor

The human α₂ -Macroglobulin Receptor/Low Density LipoproteinReceptor-Related Protein (α₂ MR) is a 600 kDa endocytotic membranereceptor. α₂ -MR is synthesized as a 4524 amino acid single chainprecursor protein. The precursor is processed into a 85 kDatransmembrane 62-chain and a 500 kDa α-chain, non-covalently bound tothe extracellular domain of the β-chain. The α₂ -MR is known to bindCa² + in a structure dependent manner (i.e. the reduced protein does notbind Ca²⁺) and is believed to the multifunctional in the sense that α₂-MR binds ligands of different classes.

The entire amino acid sequence of the α-chain can be represented byclusters of three types of repeats also found in other membrane boundreceptors and in various plasma proteins:

A: This type of repeat spans approximately 40 amino acid residues and ischaracterized by the sequential appearance of the six cysteinyl residuescontained in the repeat. Some authors have named this repeatcomplement-type domain.

B: This type of repeat also spans approximately 40 amino acid residuesand is characterized by the sequential appearance of the six cysteinylresidues contained in the repeat. In the literature this repeat has beennamed EGF-type domains.

C: This type of repeat spans approximately 55 amino acid residues and ischaracterised by the presence of the consensus sequence SEQ ID NO: 39.

This example describes the production in E. coli of a number of domainsand domain-clusters derived from the α₂ -MR protein as FX_(a) cleavablefusion proteins and the purification, in vitro folding, and the FX_(a)cleavage and processing of these recombinant proteins.

A plasmid clone containing the full length cDNA encoding the human α₂-MR protein (generously provided by Dr. Joachim Herz; Herz et al., EMBOJ., 7: 4119-4127, 1988) was used as template in a series of PolymeraseChain Reactions (PCR) designed to produce cDNA fragments correspondingto a number of polypeptides representing domains and domain-clustersderived from the α₂ -MR protein:

#1: Contains two domains of the A-type, corresponding to amino acidresidues 20 to 109 in the α₂ -MR protein. The primers SEQ ID NO: 11 andSEQ ID NO: 12 were used in the PCR.

#2: Contains two domains of the A-type followed by two type-B domains,corresponding to amino acid residues 20 to 190 in the α₂ -MR protein.The primers SEQ ID NO: 11 and SEQ ID NO: 13 were used in the PCR.

#3: Identical to #2 followed by a region containing YWTD repeats,corresponding to amino acid residues 20 to 521. The primers SEQ ID NO:11 and SEQ ID NO: 14 were used in the PCR.

#4: Contains one type B domain, followed by 8 type-A domains and finallytwo type-B domains, corresponding to amino acid residues 803 to 1265 inthe α₂ MR protein. The primers SEQ ID NO: 15 and SEQ ID NO: 16 were usedin the PCR.

#5: Contains only the 8 type-A domains also present in #4, correspondingto amino acid residues 849 to 1184 in the α₂ -MR protein. The primersSEQ ID NO: 17 and SEQ ID NO: 18 were used in the PCR.

#6: Contains the two C terminal type-B domains from #4, followed by 8YWTD repeats and one type-B domain, corresponding to amino acid residues1184 to 1502 in the α₂ -MR protein. The primers SEQ ID NO: 19 and SEQ IDNO: 20 were used in the PCR.

#7: Contains the whole region included in constructs #4 to #6,corresponding to amino acid residues 803 to 1582 in the α₂ -MR protein.The primers SEQ ID NO: 15 and SEQ ID NO: 20 were used in the PCR.

#8: Contains 10 type-A domains, corresponding to amino acid residues2520 to 2941 in the α₂ -MR protein. The primers SEQ ID NO: 21 and SEQ IDNO: 22 were used in the PCR.

#9: Contains 11 type-A domains, corresponding to amino acid residues3331 to 3778 in the α₂ -MR protein. The primers SEQ ID NO: 23 and SEQ IDNO: 24 were used in the PCR.

The amplified nucleotide sequences encoding the domains anddomain-clusters were at their 5'-end, via the PCR-reaction, linked tonucleotide sequences (included in SEQ ID NO: 11, 15, 17, 19, 21 and 23)encoding the amino acid sequence SEQ ID NO: 37 which constitutes acleavage site for the bovine restriction protease FX_(a) (Nagai and Th.oslashed.gersen, Methods in Enzymology, 152: 461-481, 1987). Theamplified DNA fragments were either subcloned into the E. coliexpression vector pT₇ H₆ (Christensen et al., FEBS Letters. 295:181-184, 1991) or the expression plasmid pLcIIMLCH₆, which is modifiedfrom pLcIIMLC (Nagai et al., Nature, 332: 284-286, 1988) by theinsertion of an oligonucleotide encoding six histidinyl residuesC-terminal of the myocin light chain fragment. The construction of theresulting plasmids pT₇ H₆ FX-#1 to #3 and pLcIIMLCH₆ FX-#4 to #9 isoutlined in FIGS. 6-8 and in FIG. 9 is shown the amino acid sequence ofthe expressed protein (SEQ ID NO: 52).

The domains and domain-clusters subcloned in the pT₇ H₆ FX series weregrown and expressed in E. coli BL21 cells in a medium scale (2 liter) asdescribed by Studier, and Moffat, J. Mol. Biol., 189: 113-130, 1986.Exponentially growing cultures at 37° C. were at OD₆₀₀ 0.8 infected withbacteriophage λCE6 at a multiplicity of approximately 5. Cultures weregrown at 37° C. for another three hours before cells were harvested bycentrifugation. Cells were lysed by osmotic shock and sonification andtotal cellular protein extracted into phenol (adjusted to pH 8 withTrisma base).

The domain clusters subcloned in the pLcIIMLCH₆ series were grown andexpressed in E. coli QY13 cells as described in Nagai and Th.oslashed.gensen. Methods in Enzymology, 152: 461-481, 1987. Exponentiallygrowing cultures (4 liter) at 30° C. were at OD₆₀₀ 1.0 transferred to42° C. for 15 min. This heal shock induces synthesis of the fusionproteins. The cultures were further incubated at 37° C. for three tofour hours before cells were harvested by centrifugation. Cells werelysed by osmotic shock and sonification and total cellular proteinextracted into phenol (adjusted to pH 8 with Trisma base).

Crude protein was precipitated from the phenol phase by addition of 3.5volumes of ethanol and centrifugation. The protein pellet was dissolvedin a buffer containing 6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and0.1 M dithioerythriol. Following gel filtration on Sephadex G-25(Pharmacia, Sweden) into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, 10 mM2-mercaptoethanol and 2 mM methionine the crude protein preparationswere applied to a Ni²⁺ activated NTA-agarose columns for purification(Hochuli et al., 1988) of the fusion proteins and subsequently toundergo the cyclic folding procedure.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant, and/or use.

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder Example 1.

Upon application of the crude protein extracts on the Ni²⁺ NTA-agarosecolumn, the fusion proteins were purified from the majority of E. coliand λ phage proteins by washing with one column volume of the loadingbuffer followed by 6 M guanidinium chloride, 50 mM Tris-HCl, 10 mM 2mercaptoethanol, and 2 mM methionine until the optical density (OD) at280 nm of the eluate was stable.

Each of the fusion proteins were refolded on the Ni²⁺ NTA-agarose columnusing a gradient manager profile as described in table 4 and 0.5 M NaCl,50 mM Tris-HCl pH 8, 2 mM CaCl₂, 0.33 mM methionine, and 2.0 mM/0.2 mMreduced/oxidized glutathione as buffer A and 4 M urea, 0.5 M NaCl, 50 mMTris-HCl pH 8, 2 mM CaCl₂, 2 mM methionine, and 3 mM reduced glutathioneas buffer B. The reduced/oxidized glutathione solution was freshlyprepared as a 100 times stock solution by addition of 9.9 M H₂ O₂ to astirred solution of 0.2 M reduced glutathione before addition to bufferA.

After completion of the cyclic folding procedure the fusion proteinsrepresenting domains and domain-clusters derived from the α₂ -MR proteinwere eluted from the Ni²⁺ NTA-agarose column with a buffer containing0.5 M NaCl, 50 mM Tris-HCl, 5 mM EDTA pH 8. Fusion proteins that wereaggregated and precipitated on the Ni²⁺ NTA-agarose column were elutedin buffer B.

Approximately 75% of the fusion protein material expressed from theplasmids pT₇ H₆ FX-#1 and #2, representing the N-terminal two and fourcysteine-rich domains of the α₂ -MR protein were eluted from the Ni²⁺NTA-agarose column by the non-denaturing buffer. The majority of thisfusion protein material appeared as monomers as judged by non-reducingSDS-PAGE analysis. The yields of monomeric fusion protein #1 and #2 wereestimated to be approximately 50 mg.

Approximately 50% of the fusion protein material expressed from allother expression plasmids representing domain-clusters derived from theα₂ -MR protein was eluted from the Ni²⁺ NTA-agarose column by the nondenaturing buffer. Between 30% (fusion proteins #5 and #7) and 65%(fusion protein #4) of these fusion proteins appeared as monomers asjudged by non-reducing SDS-PAGE analysis (see FIG. 17, lanes 9 and 10).

Each fusion protein eluted by the non-denaturing elution buffer wascleaved with the restriction protease FX_(a) overnight at roomtemperature in an estimated weight to weight ratio of 100 to one.

Upon gelfiltration on Sephadex G-25 into 100 mM NaCl, 25 mM Tris-HCl pH8, the protein solution was passed through a Ni²⁺ NTA-agarose columnthereby removing uncleaved fusion protein and the liberated N-terminalfusion tail originating from the cleaved fusion proteins. FX_(a) wasremoved from the solution by passing the recombinant protein solutionsthrough a small column of SBTI-agarose (Soy Bean Trypsin Inhibitorimmobilized on Sepharose CL-6B (Pharmacia, Sweden)).

SDS-PAGE analysis of the refolded, soluble fusion protein product #4 ispresented in FIG. 17, lanes 9 and 10, showing reduced and unreducedsamples, respectively. The mobility increase observed for the unreducedsample reflects the compactness of the polypeptide due to the presenceof 33 disulphide bridges.

Each of the recombinant proteins was found to bind Ca²⁺ in a structuredependent manner.

It was found by Dr. S.o slashed.ron Moestrup that a monoclonal antibody,A2MRα-5 derived from the natural human α₂ -MR, bound the recombinantproteins expressed by the constructs #4, #6, and #7 whereas amonospecific antibody, A2MRα-3 derived also from natural α₂ -MR, wasfound to bind the recombinant protein expressed by construct #8. Thebinding specificity of both antibodies is structure dependent (i.e. theantibodies neither react with reduced α₂ -MR nor with reducedrecombinant protein)

Example 5

Production and folding of bovine coagulation Factor X_(a) (FX_(a))

This example describes the production in E. coli of one fragment derivedfrom bovine FX_(a) as a FX_(a) cleavable fusion protein and thepurification, in vitro folding, and the processing of the recombinantprotein.

The cDNA encoding bovine FX was cloned by specific amplification in aPolymerase Chain Reaction (PCR) of the nucleotide sequences encodingbovine FX from amino acid residues Ser₈₂ to Trp₄₀₄ (SEQ ID NO: 2,residues 82-484) (FXΔγ, amino acid numbering relates to the full codingreading frame) using 1st strand oligo-dT primed cDNA synthesized fromtotal bovine liver RNA as template. Primers used in the PCR were SEQ IDNO: 25 and SEQ ID NO: 26. RNA extraction and cDNA synthesis wereperformed using standard procedures.

The amplified reading frame encoding FXΔγ was at the 5'-end, via thePCR-reaction, linked to nucleotide sequences encoding the amino acidsequence SEQ ID NO: 37 which constitute a cleavage site for the bovinerestriction protease FX_(a) (Nagai, and Th.o slashed.gersen. Methods inEnzymology, 152: 461-481, 1987). The amplified DNA fragments was clonedinto the E. coli expression vector pLcIIMLCH₆, which is modified frompLcIIMLC (Nagai et al., Nature, 332: 284-286, 1988) by the insertion ofan oligonucleotide encoding six histidinyl residues C-terminal of themyosin light chain fragment. The construction of the resulting plasmidpLcIIMLCH₆ FX-FXΔγ is outlined in FIG. 10 and in FIG. 11 is shown theamino acid sequence of the expressed protein (SEQ ID NO: 53).

The pLcIIMLCH₆ -FXΔγ plasmid was grown and expressed in E. coli QY13cells as described in Nagai and Th.o slashed.gersen (Methods inEnzymology, 152: 461-481, 1987). Exponentially growing cultures at 30°C. were at OD₆₀₀ 1.0 incubated at 42° C. for 15 min. This heat shockinduces synthesis of the fusion proteins. The cultures are furtherincubated at 37° C. for three to four hours before cells are harvestedby centrifugation. Cells were lysed by osmotic shock and sonificationand total cellular protein extracted into phenol (adjusted to pH 8 withTrisma base).

Crude protein was precipitated from the phenol phase by addition of 2.5volumes of ethanol and centrifugation. The protein pellet was dissolvedin a buffer containing 6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and0.1 M dithioerythriol. Following gel filtration on Sephadex G-25(Pharmacia, LKB, Sweden) into 8 Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, 10mM 2-mercaptoethanol the crude protein preparation was applied to a Ni²⁺activated NTA-agarose matrix for purification (Hochuli et al., 1988.) ofthe FXΔγ fusion protein and subsequently to undergo the cyclic foldingprocedure.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder Example 1.

Upon application of the crude protein extracts on the Ni²⁺ NTA-agarosecolumn, the fusion proteins were purified from the majority of E. coliand λ phage proteins by washing with one column volume of the loadingbuffer followed by 6 M guanidinium chloride, 50 mM Tris-HCl, and 10 mM2-mercaptoethanol until the optical density (OD) at 280 nm of the eluatewas stable.

The fusion protein was refolded on the Ni²⁺ NTA-agarose column using agradient manager profile as described in table 5 and 0.5 M NaCl, 50 mMTris HCl pH 8, 2 mM CaCl₂, and 2.0 mM/0.2 mM reduced/oxidizedglutathione as buffer A and 8 M urea, 0.5 M NaCl, 50 mM Tris-HCl pH 8, 2mM CaCl₂, and 3 mM reduced glutathione as buffer B. The reduced/oxidizedglutathione solution was freshly prepared as a 100 times stock solutionby addition of 9.9 M H₂ O₂ to a stirred solution of 0.2 M reducedglutathione before addition to buffer A.

After completion of the cyclic folding procedure the FXΔγ fusion proteinwas eluted from the Ni²⁺ NTA-agarose column with a buffer containing 0.5M NaCl, 50 mM Tris-HCl, 5 mM EDTA pH 8. Fusion protein that wasaggregated and precipitated on the Ni²⁺ NTA-agarose column was eluted inbuffer B.

Approximately 33% of the FXΔγ fusion protein material was eluted fromthe Ni²⁺ NTA-agarose column by the non-denaturing buffer. The amount ofFXΔγ fusion protein was estimated to 15 mg. Only about one third of thisfusion protein material appeared as monomers as judged by non-reducingSDS-PAGE analysis corresponding to an overall efficiency of the foldingprocedure of approximately 10%.

FXΔγ fusion protein in non-denaturing buffer was activated by passingthe recombinant protein solution through a small column oftrypsin-agarose (trypsin immobilized on Sepharose CL-6B (Pharmacia,Sweden)).

The activated recombinant FXΔγ fusion protein was assayed forprotcolytic activity and substrate specificity profile using standardprocedures with chromogenic substrates. The activity and substratespecificity profile was indistinguishable from that obtained for naturalbovine FX_(a)

Example 6

Production and folding of kringle domains 1 and 4 from human plasminogen

This example describes the production in E. coli of the lysine bindingkringle domains 1 and 4 from human plasminogen (K1 and K4, respectively)as FX_(a) cleavable fusion proteins and the purification and in vitrofolding of the K1- and K4-fusion proteins.

A plasmid clone containing the full length cDNA encoding humanplasminogen cloned into the general cloning vector pUC18 (generouslyprovided by Dr. Earl Davis, Seattle, USA) was used as template in aPolymerase Chain Reaction (PCR) designed to produce cDNA fragmentscorresponding to K1 (corresponding to amino acid residues Ser₈₁ toGlu₁₆₂ in so-called Glu-plasminogen) and K4 (corresponding to amino acidresidues Val₃₅₄ to Ala₄₃₉ in so-called Glu-plasminogen). The primers SEQID NO: 27 and SEQ ID NO: 28 were used in the PCR producing K1 and theprimers SEQ ID NO: 29 and SEQ ID NO: 30 were used in the PCR producingK4. The amplified reading frames encoding K1 and K4 were at their5'-ends, via the PCR-reaction, linked to nucleotide sequences, includedin SEQ ID NO: 27 and SEQ ID NO: 29, encoding the amino acid sequence SEQID NO: 37 which constitutes a cleavage site for the bovine restrictionprotease FX_(a) (Nagai and Th.o slashed.gersen. Methods in Enzymology,152: 461-481, 1987). The amplified K1 DNA fragment was cloned into theE. coli expression vector pLcIIMLCH₆, which is modified from pLcIIMLC(Nagai et al., Nature, 332: 284-286, 1988) by the insertion of anoligonucleotide encoding six histidinyl residues C-terminal of themyosin light chain fragment. The construction of the resulting plasmidpLcIIMLCH₆ FX-K1 is outlined in FIG. 12. The amplified K4 DNA fragmentwas cloned into the E. coli expression vector pLcIIH₆, which is modifiedfrom pLcII (Nagai and Th.o slashed.gersen. Methods in Enzymology, 152:461-481, 1987) by the insertion of an oligonucleotide encoding sixhistidinyl residues C-terminal of the cII fragment. The construction ofthe resulting plasmid pLcIIH₆ FX-K4 is outlined in FIG. 13 and in FIG.14 is shown the amino acid sequence of human "Glu" plasminogen (SEQ IDNO: 54).

Both the pLcIIMLCH₆ -K1 plasmid and the pLcIIH₆ FX-K4 plasmid were grownand expressed in E. coli QY13 cells as described in Nagai and Th.oslashed.gersen. Methods in Enzymology, 152: 461-481, 1987. Exponentiallygrowing cultures at 30° C. were at OD₆₀₀ 1.0 transferred to 42° C. for15 min. This heat shock induced synthesis of the fusion proteins. Thecultures were further incubated at 37° C. for three to four hours beforecells were harvested by centrifugation. Cells were lysed by osmoticshock and sonification and total cellular protein extracted into phenol(adjusted to pH 8 with Trisma base).

Crude protein was precipitated from the phenol phase by addition of 2.5volumes of ethanol and centrifugation. The protein pellet was dissolvedin a buffer containing 6 M guanidinium chloride, 50 mM Tris-HCl pH 8 and0.1 M dithioerythriol. Following gel filtration on Sephadex G-25(Pharmacia, Sweden) into 8 M Urea, 1 M NaCl, 50 mM Tris-HCl pH 8, 10 mM2-mercaptoethanol, and 2 mM methionine the crude protein preparation wasapplied to a Ni²⁺ activated NTA--agarose matrix for purification(Hochuli et al., 1988.) of the K1- and K4-fusion proteins andsubsequently to undergo the cyclic folding procedure.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder Example 1.

Upon application of the crude protein extracts on the Ni²⁺ NTA-agarosecolumn, the fusion proteins were purified from the majority of E. coliand λ phage proteins by washing with one column volume of the loadingbuffer followed by 6 M guanidinium chloride, 50 mM Tris-HCl, 10 mM2-mercaptoethanol, and 2 mM methionine until the optical density (OD) at280 nm of the column eluate was stable.

The fusion protein was refolded on the Ni²⁺ NTA-agarose column using agradient manager profile as described in table 4 with 0.5 M NaCl, 50 mMTrio-HCl pH 8, 10 mM 6 aminohexanoic acid (ε-aminocapronic acid, ε-ACA),0.33 mM methionine, and 2.0 mM/0.2 mM reduced/oxidized glutathione asbuffer A and 4 M Urea, 0.5 M NaCl, 50 mM Tris-HCl pH 8, 10 mM ε-ACA, 2mM methionine, and 3 mM reduced glutathione as buffer B. Thereduced/oxidized glutathione solution was freshly prepared as a 100times stock solution by addition of 9.9 M H₂ O₂ to a stirred solution of0.2 M reduced glutathione before addition to buffer A.

After completion of the cyclic folding procedure each of the K1 and K4fusion proteins were eluted from the Ni²⁺ NTA-agarose column with abuffer containing 0.5 M NaCl, 50 mM Tris-HCl, 5 mM EDTA pH 8. Fusionproteins that were aggregated and precipitated on the Ni²⁺ NTA-agarosecolumn were eluted in buffer B.

Virtually all of the K1- and K4-fusion protein material was eluted fromthe Ni²⁺ -agarose columns by the non-denaturing buffer. The estimatedyields of K1-fusion protein and K4-fusion protein were approximately 60mg. Virtually all of the K1-fusion protein as well as the K4-fusionprotein appeared as monomers as judged by non-reducing SDS-PAGE analysiscorresponding to an efficiency of the folding procedure above 90%.

SDS-PAGE analysis of the production of recombinant plasmin ogen kringles1 and 4 is presented in FIG. 17.

The K1-fusion protein and the K4-fusion protein were further purified byaffinity chromatography on lysine-Sepharose CL-6B (Pharmacia, Sweden).The fusion proteins were eluted from the affinity columns by a buffercontaining 0.5 M NaCl, 50 mM Tris-HCl pH 8, 10 mM ε-ACA.

Binding to lysine-Sepharose is normally accepted as an indication ofcorrect folding of lysine binding kringle domains.

The three dimensional structures of recombinant K1 and K4 proteindomains, produced by this cyclic folding procedure and which have beenfully processed by liberation from the N-terminal fusion tail andsubsequently purified by ion exchange chromatography, have beenconfirmed by X-ray diffraction (performed by Dr. Robert Huber) and twodimensional NMR analysis (performed by stud. scient. Peter Reinholdt andDr. Flemming Poulsen). The general yield of fully processed recombinantK1 and K4 protein domains by this procedure is 5 mg/liter culture.

Example 7

Production in E. coli and refolding of recombinant fragments derivedfrom human α₂ -Macroglubolin and chicken Ovostatin

This example describes the production in E. coli of the receptor-bindingdomain of human α₂ -Macroglobulin (α₂ -MRBDv) as a FX_(a) cleavablefusion protein, and the purification of the recombinant α₂ -MRBDv afterFX_(a) cleavage.

The 462 bp DNA fragment encoding the α₂ -Macroglobulin reading framefrom amino acid residues Val₁₂₉₉ to Ala₁₄₅₁ (α₂ -MRDv) was amplified ina Polymerase Chain Reaction (PCR), essentially following the protocol ofSaiki et al., (1988). pA2M (generously provided by Dr. T. Kristensen)containing the full length cDNA of human α₂ -Macroglobulin was used astemplate, and the oligonucleotides SEQ ID NO: 31 and SEQ ID NO: 32 asprimers. The amplified coding reading frame was at the 5'-end, via thePCR-reaction, linked to a nucleotide sequence, included in SEQ ID NO: 7,encoding the amino acid sequence SEQ ID NO: 37 which constitute acleavage site for the bovine restriction protease FX_(a) (Nagai and Th.oslashed.gersen, 1987). The amplified DNA fragment was subcloned into theE. coli expression vector pT₇ H₆ (Christensen et al., 1991). Theconstruction of the resulting plasmid pT₇ H₆ FX-α₂ MRDv (expressinghuman α₂ -MRDv) is outlined in FIG. 18 and the amino acid sequence ofthe expressed protein is shown in FIG. 19 (SEQ ID NO: 55).

Recombinant human α₂ MRDv was produced by growing and expressing theplasmid pT₇ H₆ FX-α₂ MRDv in E. coli BL21 cells in a medium scale (2×1liter) as described by Studier and Moffat, J. Mol. Biol., 189: 113-130,1986. Exponentially growing cultures at 37° C. were at OD₆₀₀ 0.8infected with bacteriophage λCE6 at a multiplicity of approximately 5.Cultures were grown at 37° C. for another three hours before cells wereharvested by centrifugation. Cells were lysed by osmotic shock andsonification and total cellular protein extracted into phenol (adjustedto pH 8 with Trisma base). Protein was precipitated from the phenolphase by addition of 2.5 volumes of ethanol and centrifugation. Theprotein pellet was dissolved in a buffer containing 6 M guanidiniumchloride, 50 mM Tris-HCl pH 8 and 50 mM dithioerythriol. Following gelfiltration on Sephadex G-25 (Pharmacia, LKB, Sweden) into 8 M Urea, 1 MNaCl, 50 mM Tris-HCl pH8, and 10 mM 2-mercaptoethanol the crude proteinpreparation was applied to a Ni²⁺ activated NTA-agarose column (Ni²⁺NTA-agarose) for purification (Hochuli et al., 1988) of the fusionprotein, MGSHHHHHHGSIEGR-α₂ MRDv (wherein MGSHHHHHHGSIEGR is SEQ ID NO:48) and subsequently to undergo the cyclic folding procedure.

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder Example 1.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

Upon application of the crude protein extract on the Ni²⁺ NTA-agarosecolumn, the fusion protein, MGSHHHHHHGSIEGR-α₂ MRDv (whereinMGSHHHHHHGSIEGR is SEQ ID NO: 48) was purified from the majority of E.coli and λ phage proteins by washing with one column volume of theloading buffer followed by 6 M guanidinium chloride, 50 mM Tris-HCl, and10 mM 2-mercaptoethanol, until the optical density (OD) at 280 nm of theeluate was stable.

The fusion protein was refolded on the Ni²⁺ NTA-agarose column using agradient manager profile as described in table 4 and 0.5 M NaCl, 50 mMTris-HCl pH 8, and 2.0 mM/0.2 mM reduced/oxidized glutathione as bufferA and 8 M urea 0.5 M NaCl, 50 mM Tris-HCl pH 8, and 5 mM reducedglutathione as buffer B. The reduced/oxidized glutathione solution wasfreshly prepared as a 200 times stock solution by addition of 9.9 M H₂O₂ to a stirred solution of 0.2 M reduced glutathione before addition tobuffer A.

After completion of the cyclic folding procedure the α₂ MRDv fusionprotein was eluted from the Ni²⁺ NTA-agarose column with a buffercontaining 0.5 M NaCl, 50 mM Tris-HCl, 20 mM EDTA pH 8. Fusion proteinthat was aggregated and precipitated on the Ni²⁺ NTA-agarose column waseluted in buffer B.

Approximately 50% of the fusion protein material was eluted in theaqueous elution buffer. Half of this fusion protein material appearedmonomeric and folded as judged by non-reducing SDS-PAGE analysis.

Recombinant α₂ MRDv protein was liberated from the N-terminal fusiontail by cleavage with the restriction protease FX_(a) at roomtemperature in a weight to weight ratio of approximately 50 to one forfour hours. After cleavage the α₂ MRDv protein was isolated fromuncleaved fusion protein, the liberated fusion tail, and FX_(a), bygelfiltration on Sephadex G-25 into 10 mM NaCl, 50 mM Tris-HCl pH 8,followed by ion exchange chromatography on Q-Sepharose: α₂ MRDv waseluted in a linear gradient (over 10 column volumes) from 10 mM NaCl, 10mM Tris-HCl pH 8 to 500 mM NaCl, 10 mM Tris-HCl pH 8. The α₂ MRDvprotein eluted at 150 mM NaCl.

The recombinant α₂ MRDv domain binds to the α₂ M receptor with a similaraffinity for the receptor as exhibited by the complete α₂ -Macroglobulinmolecule (referring to the estimated K_(D) in one ligand-one receptorbinding (Moestrup and Gliemann 1991)). Binding analysis was performed byDr. S.o slashed.ren K. Moestrup and stud. scient. Kare Lehmann).

Example 8

Production in E. coli and refolding of recombinant fragments derivedfrom the trout virus VHS envelope glycoprotein G

Expression and in vitro refolding of recombinant fragments derived fromthe envelope glycoprotein G from the trout virus VHS in E. coli asFX_(a) cleavable fusion proteins was performed using general strategiesand methods analogous to those outlined in the general description ofthe "cyclic refolding procedure" and given in Examples 1 and 6.

Example 9

Production in E. coli and refolding of recombinant human Tetranectin andrecombinant fragments derived from human Tetranectin

Tetranectin is a tetrameric protein consisting of four identical andnon-covalently linked single chain subunits of 181 amino acid residues(17 kDa). Each subunit contains three disulphide bridges and binds Ca²⁺.Tetranectin is found in plasma and associated with extracellular matrix.Tetranectin binds specifically to plasminogen kringle 4. This bindingcan specifically be titrated by lysine or ω-amino acids.

The cDNA encoding the reading frame corresponding to the maturetetranectin single chain subunit was cloned by specific amplification ina Polymerase Chain Reaction (PCR) (Saiki et al., 1988) of the nucleotidesequences from amino acid residue Glu₁ to Val₁₈₁ using 1s^(t) strandoligo-dT primed cDNA synthesized from total human placental RNA astemplate. Primers used in the PCR were SEQ ID NO: 33 and SEQ ID NO: 34.RNA extraction and cDNA synthesis were performed using standardprocedures.

The amplified reading frame encoding the monomer subunit of tetranectinwas at the 5'-end, via the PCR-reaction, linked to nucleotide sequencesencoding the amino acid sequence SEQ ID NO: 37 which constitute acleavage site for the bovine restriction protease FX_(a) (Nagai, andTh.o slashed.gersen, 1987). A glycine residue was, due to the specificdesign of the 5'-PCR primer (SEQ. ID NO. 33), inserted between theC-terminal arginine residue of the FX_(a) cleavage site (SEQ ID NO. 37)and the tetranectin Glu₁ -residue. The amplified DNA fragment wassubcloned into the E. coli expression vector pT₇ H₆ (Christensen et al.,1991). The construction of the resulting plasmid pT₇ H₆ FX-TETN(expressing the tetranectin monomer) is outlined in FIG. 20 and theamino acid sequence of the expressed protein is shown in FIG. 21 (SEQ IDNO: 56).

To prepare the tetranectin monomer, the plasmid pT₇ H₆ FX-TETN was grownin medium scale (4×1 liter; 2XTY medium, 5 mM MgSO₄ and 100 μgampicillin) in E. coli BL21 cells, as described by Studier and Moffat,J. Mol. Biol., 189: 113-130, 1986. Exponentially growing cultures at 37°C. were at OD₆₀₀ 0.8 infected with bacteriophage λCE6 at a multiplicityof approximately 5. Cultures were grown at 37° C. for another threehours and the cells harvested by centrifugation. Cells were resuspendedin 150 ml of 0.5 M NaCl, 10 mM Tris HCl pH 8, and 1 mM EDTA pH 8. Phenol(100 ml adjusted to pH 8) was added and the mixture sonicated to extractthe total protein. Protein was precipitated from the phenol phase by 2.5volumes of ethanol and centrifugation.

The protein pellet was dissolved in a buffer containing 6 M guanidiniumchloride, 50 mM Tris-HCl pH 8 and 0.1 M dithioerythriol. Following gelfiltration on Sephadex G-25 (Pharmacia, LKB, Sweden) into 8 M Urea, 1 MNaCl, 50 mM Tris-HCl pH 8 and 10 mM 2-mercaptoethanol, the crude proteinpreparation was applied to a Ni²⁺ activated NTA-agarose column (Ni²⁺NTA-agarose, 75 ml pre-washed with 8 M urea, 1 M NaCl, 50 mM Tris-HCl pH8, and 10 mM 2-mercaptoethanol) for purification (Hochuli et al., 1988)of the fusion protein, MGSHHHHHHGSIEGR-TETN (wherein MGSHHHHHHGSIEGR isSEQ ID NO: 48).

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder example 1.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

The column was washed with 200 ml of 8 M urea, 1 M NaCl, 50 mM Tris-HClpH 8, and 10 mM 2-mercaptoethanol (Buffer I) and 100 ml of 6 Mguanidinium chloride, 50 mM Tris-HCl pH 8 and 10 mM 2-mercaptoethanol(Buffer II). The MGSHHHHHHGSIEGR-TETN fusion protein was eluted withBuffer II containing 10 mM EDTA pH 8 and the elute was gel filtered onSephadex G25 using Buffer I as eluant.

The eluted protein was then refolded. The fusion proteinMGSHHHHHHGSIEGR-TETN (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) wasmixed with 100 ml Ni²⁺ NTA-agarose. The resin containing bound proteinwas packed into a 5 cm diameter column and washed with Buffer Isupplemented with CaCl₂ to 2 mM. The fusion protein was refolded on theNi²⁺ NTA-agarose column at 11-12° C. using a gradient manager profile asdescribed in table 4 and 0.5 M NaCl, 50 mM Tris-HCl pH8, 2 mM CaCl₂ and2.0 mM/0.2 mM reduced/oxidized glutathione as buffer A and 8 M urea, 1 MNaCl, 50 mM Tris-HCl pH 8, 2 mM CaCl₂ and 3 mM reduced glutathione asbuffer B. The reduced/oxidized glutathione solution was freshly preparedas a 200 times stock solution by addition of 9.9 M H₂ O₂ to a stirredsolution of 0.2 M reduced glutathione before addition to buffer A.

After completion of the cyclic folding procedure the tetranectin fusionprotein was eluted from the Ni²⁺ NTA-agarose column with a buffercontaining 0.5 M NaCl, 50 mM Tris-HCl, 25 mM EDTA pH 8. The tetranectinfusion protein was cleaved with FX₂ at 4° C. overnight in a molar ratioof 1:300. After FX_(a) cleavage the protein sample was concentrated 10fold by ultrafiltration on a YM10 membrane (Amicon). Recombinanttetranectin was, after ten times dilution of the protein sample with 2mM CaCl₂, isolated by ion-exchange chromatography on Q-Sepharose(Pharmacia, Sweden) in a linear gradient over 10 column volumes from 10mM Tris-HCl pH 8, 2 mM CaCl₂ to 10 mM Tris-HCl pH 8, 2 mM CaCl₂, and 0.5M NaCl.

Recombinant tetranectin produced by this procedure was analyzed by Dr.Inge Clemmensen Rigshospitalet, Copenhagen. Dr. Clemmensen found thatthe recombinant tetranectin with respect to binding to plasminogenkringle 4 and expression of antigenic sites behaved identically tonaturally isolated human tetranectin.

Preliminary experiments comparing the efficiency of refolding, using the"cyclic refolding procedure", of recombinant Tetranectin fusion proteinbound to the Ni²⁺ NTA-agarose column versus recombinant Tetranectincontained in a dialysis bag indicate a significantly improved yield ofsoluble monomer from the solution refolding strategy. However, if eitherproduct of the cycling procedures is subjected to disulphidere-shuffling in solution in the presence of 5 mM CaCl₂ virtually all ofthe polypeptide material is converted to the correctly foldedTetranectin tetramer.

Denatured and reduced recombinant authentic Tetranectin contained in adialysis bag, was refolded over 15 cyclic exposures to buffer B (6 MUrea, 100 mM Nacl, 50 mM Tris-HCl pH=8, 2 mM/0.2 mM reduced/oxidizedglutathione, 2 mM CaCl₂ and 0.5 mM methionine) and buffer A (100 mMNaCl, 50 mM Tris-HCl pH 8, 2 mM/0.2 mM reduced/oxidized glutathione, 2mM CaCl₂, and 0.5 mM methionine).

Example 10

Production and folding of a diabody expressed intracellularly in E.coli: Mab 32 diabody directed against tumour necrosis factor.

Diabodies (described in Holliger et al., 1993) are artificial bivalentand bispecific antibody fragments.

This example describes the production in E. coli of a diabody directedagainst tumour necrosis factor alpha (TNE-α), derived from the mousemonoclonal antibody Mab 32 (Rathjen et al., 1991, 1992; AustralianPatent Appl. 7,576; EP-A-486,526).

A phagemid clone, pCANTAB5-myc-Mab32-5, containing Mab32 encoded in thediabody format (PCT/GB93/02492) was generously provided by Dr. G.Winter, Cambridge Antibody Technology (CAT) Ltd., Cambridge, UK.pCANTAB5-myc-Mab32-5 DNA was used as template in a Polymerase ChainReaction (PCR) (Saiki et al., 1988), using the primers SEQ ID NO: 35 andSEQ ID NO: 36, designed to produce a cDNA fragment corresponding to thecomplete artificial diabody. The amplified coding reading frame was atthe 5'-end, via the PCT-reaction, linked to a nucleotide sequence,included in SEQ ID NO: 35, encoding the amino acid sequence SEQ ID NO:37 which constitutes a cleavage site for the bovine restriction proteaseFX_(a) (Nagai and Th.o slashed.gersen, 1987). The amplified DNA fragmentwas subcloned into the E. coli expression vector pT₇ H6 (Christensen etal., 1991). The construction of the resulting plasmid pT₇ H₆ FX-DB32(expressing the Mab32 diabody) is outlined in FIG. 22 and the amino acidsequence of the expressed protein is shown in FIG. 23 (SEQ ID NO: 57).

To prepare the diabody fragment, the plasmid pT₇ H₆ FX-DB32 was grown inmedium scale (4×1 liter; 2×TY medium, 5 mM MgSO₄ and 100 μg ampicillin)in E. coli BL21 cells, as described by Studier and Moffat, J. Mol.Biol., 189: 113 130, 1986. Exponentially growing cultures at 37° C. wereat OD₆₀₀ 0.8 infected with bacteriophage λCE6 at a multiplicity ofapproximately 5. Forty minutes after infection, rifampicin was added(0.2 g in 2 ml methanol per liter media). Cultures were grown at 37° C.for another three hours and the cells harvested by centrifugation. Cellswere resuspended in 150 ml of 0.5 M NaCl, 10 mM Tris-HCl pH 8, and 1 mMEDTA pH 8. Phenol (100 ml adjusted to pH 8) was added and the mixturesonicated to extract the total protein. Protein was precipitated fromthe phenol phase by 2.5 volumes of ethanol and centrifugation.

The protein pellet was dissolved in a buffer containing 6 M guanidiniumchloride, 50 mM Tris-HCl pH 8 and 0.1 M dithioerythriol. Following gelfiltration on Sephadex G-25 (Pharmacia, LKB, Sweden) into 8 M Urea, 1 MNaCl, 50 mM Tris-HCl pH 8 and 10 mM 2-mercaptoethanol, the crude proteinpreparation was applied to a Ni²⁺ activated NTA-agarose column (Ni²⁺NTA-agarose, 75 ml pre-washed with 8 M urea, 1 M NaCl, 50 mM Tris-HCl pH8, and 10 mM 2-mercaptoethanol) for purification (Hochuli et al., 1988)of the fusion protein, MGSHHHHHHGSIEGR-DB32 (wherein MGSHHHHHHGSIEGR isSEQ ID NO: 48).

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder example 1.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

The column was washed with 200 ml of 8 M urea, 1 M NaCl, 50 mM Tris-HClpH 8, and 10 mM 2-mercaptoethanol (Buffer I) and 100 ml 6 M guanidiniumchloride, 50 mM Tris HCl pH 8 and 10 mM 2-mercaptoethanol (Buffer II).The MGSHHHHHHGSIEGR-DB32 fusion protein was eluted with Buffer IIcontaining 10 mM EDTA pH 8 and the elute was gel filtered on SephadexG25 using Buffer I as eluant.

The protein eluted was then refolded. The fusion proteinMGSHHHHHHGSIEGR-DB32 (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) wasmixed with 100 ml Ni²⁺ NTA-agarose. The resin containing bound proteinwas packed into a 5 cm diameter column and washed with Buffer I. Thefusion protein was refolded on the Ni²⁺ NTA-agarose column at 11-12° C.using a gradient manager profile as described in table 4 and 0.5 M NaCl,50 mM Tris-HCl pH 8, and 2.0 mM/0.2 mM reduced/oxidized glutathione asbuffer A and 8 M urea, 1 M NaCl, 50 mM Tris-HCl pH 8, and 3 mM reducedglutathione as buffer B. The reduced/oxidized glutathione solution wasfreshly prepared as a 200 times stock solution by addition of 9.9 M H₂O₂ to a stirred solution of 0.2 M reduced glutathione before addition tobuffer A.

After completion of the cyclic folding procedure the DB32 fusion proteinwas eluted from the Ni²⁺ NTA-agarose column with a buffer containing 0.5M NaCl, 50 mM Tris-HCl, 25 mM EDTA pH 8 and adjusted to 5 mM GSH, 0.5 mMGSSG and incubated for 12 to 15 hours at 20° C. The fusion protein wasthen concentrated 50 fold by ultrafiltration using YM10 membranes andclarified by centrifugation.

The DB32 fusion protein dimer was purified by gel filtration using aSuperose 12 column (Pharmacia, Sweden) with PBS as eluant.

The overall yield of correctly folded DB32 fusion protein from thisprocedure was 4 mg per liter.

An analysis by non-reducing SDS-PAGE from different stages of thepurification is shown in FIG. 26.

The MGSHHHHHHGSIEGR (SEQ ID NO: 48) N-terminal fusion peptide wascleaved off the DB32 protein by cleavage with the restriction proteaseFX_(a) (molar ratio 1:5 FX_(a) :DB32 fusion protein) at 37° C. for 20hours. This is shown as the appearance of a lower molecular weight bandjust below the uncleaved fusion protein in FIG. 26.

The refolded DB32 protein was analyzed by Cambridge Antibody TechnologyLtd. (CAT). DB32 was found to bind specifically to TNF-α and to competewith the Mab32 whole antibody for binding to TNF-α. Furthermore bothDB32 and Mab32 were competed in binding to TNF-α by sheep anti-301antiserum, which had been raised by immunizing sheep with a peptideencoding the first 18 amino acids of human TNF-α and comprised at leastpart of the epitope recognised by the murine Mab32.

Example 11

Production and refolding of human psoriasin in E. coli.

Psoriasin is a single domain Ca²⁺ -binding protein of 100 amino acidresidues (11.5 kDa). Psoriasin contains a single disulphide bridge. Theprotein which is believed to be a member of the S100 Protein family ishighly up-regulated in psoriatic skin and in primary human keratinocytesundergoing abnormal differentiation.

The plasmid pT₇ H₆ FX-PS.4 (kindly provided by Dr. P. Madsen, Insituteof Medical Biochemistry, University of Aarhus, Denmark) has previouslybeen described by Hoffman et al., (1994). The nucleotide sequenceencoding the psoriasin protein from Ser₂ to Gln₁₀₁ is in the 5'-endlinked to the nucleotide sequence encoding the amino acid sequenceMGSHHHHHHGSIEGR (SEQ ID NO: 48). A map of pT₇ H₆ FX-PS.4 is given inFIG. 24 and the amino acid sequence of human psoriasin is listed in FIG.25 (SEQ ID NO: 58).

Recombinant human psoriasin was grown and expressed from the plasmid pT₇H₆ FX-PS.4 in E. coli BL21 cells and total cellular protein extracted asdescribed (Hoffmann et al., 1994). Ethanol precipitated total proteinwas dissolved in a buffer containing 6 M guanidinium chloride, 50 mMTris-HCl pH 8 and 50 mM dithioerythriol. Following gel filtration onSephadex G-25 (Pharmacia, LKB, Sweden) into 8 M Urea, 0.5 M NaCl, 50 mMTris-HCl pH 8 and 5 mM 2-mercaptoethanol the crude protein preparationwas applied to a Ni²⁺ activated NTA agarose column (Ni²⁺ NTA-agarose)for purification (Hochuli et al., 1988) of the fusion protein,MGSHHHHHHGSIEGR-psoriasin (wherein MGSHHHHHHGSIEGR is SEQ ID NO: 48) andsubsequently to undergo the cyclic folding procedure.

Preparation and "charging" of the Ni²⁺ NTA-agarose column is describedunder Example 1.

All buffers prepared for liquid chromatography were degassed undervacuum prior to addition of reductant and/or use.

Upon application of the crude protein extract on the Ni²⁺ NTA-agarosecolumn, the fusion protein, MGSHHHHHHGSIEGR-psoriasin (whereinMGSHHHHHHGSIEGR is SEQ ID NO: 48) was purified from the majority of E.coli and λ phage proteins by washing with one column volume of theloading buffer followed by 6 M guanidinium chloride, 50 mM Tris-HCl, and5 mM 2-mercaptoethanol until the optical density (OD) at 280 nm of theeluate was stable.

The fusion protein was refolded on the Ni²⁺ NTA-agarose column using agradient manager profile as described in table 4 and 0.5 M NaCl, 50 mMTris HCl pH 8, 2 mM CaCl₂ and 1.0 mm/0.1 mM reduced/oxidized glutathioneas buffer A and 8 M urea, 0.5 M NaCl, 50 mM Tris-HCl pH 8, 2 mM CaCl₂and 5 mM reduced glutathione as buffer B. The reduced/oxidizedglutathione solution was freshly prepared as a 200 times stock solutionby addition of 9.9 M H₂ O₂ to a stirred solution of 0.2 M reducedglutathione before addition to buffer A.

After completion of the cyclic folding procedure was psoriasin fusionprotein was eluted from the Ni²⁺ NTA-agarose column with a buffercontaining 0.5 M NaCl, 50 mM Tris-HCl, 10 mM EDTA pH 8. Fusion proteinthat was aggregated and precipitated on the Ni²⁺ NTA-agarose column waseluted in buffer B.

Approximately 95% of the fusion protein material was eluted by thenon-denaturing elution buffer. As judged by non-reducing SDS-PAGEanalysis 75% of the soluble fusion protein material appeared to bemonomeric yielding an overall efficiency of the folding procedure ofapproximately 70%. The efficiency of the previously described refoldingprocedure for reproduction of recombinant human psoriasin (Hoffman etal., 1994) was estimated to be less than 25%.

The psoriasin fusion protein was cleaved with FX_(a) in a molar ratio of100:1 for 48 hours at room temperature. After gelfiltration into abuffer containing 20 mM Na-acetate pH 5 and 20 mM NaCl on Sephadex G-25the protein sample was applied onto a S-Sepharose ion exchange column(Pharmacia). Monomeric recombinant psoriasin was eluted over 5 columnvolumes with a linear gradient from 20 mM Na acetate pH 5, 20 mM NaCl to0.5 M NaCl. Monomeric psoriasin eluted at 150 mM NaCl. Dimeric andhigher order multimers of psoriasin together with uncleaved fusionprotein eluted later in the gradient. Fractions containing the cleavedpurified recombinant protein were gelfiltrated on Sephadex G25 into abuffer containing 150 mM NaCl, 10 mM Tris-HCl pH 7.4 and stored at 4° C.

Example 12

Evaluation procedure for suitability testing of thiol compounds for useas reducing agents in cyclic refolding and determination of optimallevels of denaturants and disulphide reshuffling agents for optimizationof cyclic refolding procedures.

In order to improve the yield of correctly folded protein obtainablefrom cyclic refolding the number of productive cycles should bemaximized (see SUMMARY OF THE INVENTION). Productive cycles arecharacterized by steps of denaturation where misfolded protein, en routeto dead-end aggregate conformational states, is salvaged into unfoldedconformational states while most of the already correctly folded proteinremains in conformational states able to snap back into the refoldedstate during the refolding step of the cycle.

A number of disulphide bridge containing proteins, like β₂-microglobulin, are known to refold with high efficiency (>95%) whensubjected to high levels of denaturing agents as long as theirdisulphide bridges remain intact.

This example describes how to evaluate suitability of a thiol compoundfor use in cyclic refolding on the basis of its ability to discriminatecorrect from incorrect disulphide bridges and how to optimize levels ofdenaturing agent and/or reducing agent to be used in the denaturationsteps in order to maximize the number of productive cycles. As modelsystems we chose a mixture of mono, di- and multimeric forms of purifiedrecombinant human β₂ -microglobulin. Our specific aim was to analyze thestability of different topological forms of human β₂ -microglobulinagainst reduction by five different reducing agents at variousconcentrations of denaturing agent.

Human β₂ -microglobulin (produced as described in Example 13) in 6 Mguanidinium chloride, 50 mM Tris-HCl and 10 mM 2-mercaptoethanol pH 8was gelfiltrated into non-denaturing buffer (50 mM Tris-HCl), 0.5 M NaClpH 8). Only a fraction of the protein in the sample was soluble in thenon-denaturing buffer. After 48 hours exposure to air, the proteinsolution appeared unclear. Non-reducing SDS-PAGE analysis showed thatmost of the protein had been oxidized into multimeric forms and only asmall fraction was oxidized and monomeric (FIG. 27, lane 1).

The protein solution was aliquoted into a number of tubes and varyingamounts of urea added while keeping the concentration of protein andsalt at a constant level.

Reducing agent, either glutathione, cysteine ethyl ester,N-acetyl-L-cysteine, mercaptosuccinic acid or 2-mercaptoethanol wasadded to the ensemble of protein samples with varying ureaconcentrations. Each reducing agents was added to a final concentrationof 4 mM. The protein samples were incubated at room temperature for 10min and then free thiol groups were blocked by addition of iodoaceticacid to a final concentration of 12 mM. Finally, the protein sampleswere analyzed by non-reducing SDS-PAGE (FIGS 27-30, 31, and data notshown). The compositions of the test-samples used in the non-reducingSDS-PAGE as well as the results are given below in the following tables;in the rows indicating the ability of the chosen reducing agent toreduce disulphide bridges the marking "+++" indicated good ability, "++"indicated intermediate ability, "+" indicates weak ability, whereas nomarking indicates that no measurable effect could be observed.

    __________________________________________________________________________    Composition of samples used in SDS-PAGE of FIG. 27    Test no.            1  2  3  4  5  6  7  8  9  10 11    __________________________________________________________________________    μl protein solution            36 36 36 36 36 36 36 36 36 36 36    μl Buffer A            160               160                  140                     120                        100                           80 70 60 50 40 20    μl Buffer B            0  0  20 40 60 80 90 100                                    110                                       120                                          140    μl GSH            0  4  4  4  4  4  4  4  4  4  4    M urea  0  0  1  2  3  4  4.5                                 5  5.5                                       6  7    Ability to reduce                  |                     |                        ||                           ||                              |||                                 |||                                    |||                                       |||                                          |||    wrong disulphide    bridges    Ability to reduce                  +  +++    correct disulphide    bridges    __________________________________________________________________________     Buffer A: 50 mM Tris.HCl pH 8, 0.5 M NaCl     Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl     GSII: 0.2 M Gluthatione     Protein solution: 2 mg/ml hβ.sub.2 m, 50 mM Tris.HCl pH 8, 0.5 M NaC

    __________________________________________________________________________    Composition of samples used in SDS-PAGE of FIG. 28    Test no.   1  2  3  4  5  6  7  8  9    __________________________________________________________________________    μl protein solution               36 36 36 36 36 36 36 36 36    μl Buffer A               160                  160                     140                        120                           100                              80 60 40 20    μl Buffer B               0  0  20 40 60 80 100                                    120                                       140    μl CE   ||                  4  4  4  4  4  4  4  4    M urea     0  0  1  2  3  4  5  6  7    Ability to reduce wrong                  ++ ++ ++ +++                              +++                                 +++                                    +++                                       +++    disulphide bridges    Ability to reduce correct    ++ +++                                       +++    disulphide bridges    __________________________________________________________________________     Buffer A: 50 mM Tris.HCl pH 8, 0.5 M NaCl     Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl     CE: 0.2 M Lcysteine ethyl ester     Protein solution: 2 mg/ml hβ.sub.2 m, 50 mM Tris.HCl pH 8, 0.5 M NaC

    __________________________________________________________________________    Composition of samples used in SDS-PAGE of FIG. 29    Test no    1  2  3  4  5  6  7  8  9    __________________________________________________________________________    μl protein solution               36 36 36 36 36 36 36 36 36    μl Buffer A               160                  160                     140                        120                           100                              80 60 40 20    μl Buffer B               0  0  20 40 60 80 100                                    120                                       140    μl ME   0  4  4  4  4  4  4  4  4    M urea     0  0  1  2  3  4  5  6  7    Ability to reduce wrong                  ++ ++ ++ +++                              +++                                 +++                                    +++                                       +++    disulphide bridges    Ability to reduce correct +  ++ +++                                       +++    disulphide bridges    __________________________________________________________________________     Buffer A: 50 mM Tris.HCl pH 8, 0.5 M NaCl     Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl     ME: 0.2 M 2mercaptoethanol     Protein solution: 2 mg/ml hβ.sub.2 m, 50 mM Tris.HCl pH 8, 0.5 M NaC

    __________________________________________________________________________    Composition of samples used in SDS-PAGE of FIG. 30    Test no.   1  2  3  4  5  6  7  8  9    __________________________________________________________________________    μl protein solution               36 36 36 36 36 36 36 36 36    μl Buffer A               160                  160                     140                        120                           100                              80 60 40 20    μl Buffer B               0  0  20 40 60 80 100                                    120                                       140    μl MSA  0  4  4  4  4  4  4  4  4    M urea     0  0  1  2  3  4  5  6  7    Ability to reduce wrong                  ++ ++ ++ ++ ++ +++                                    +++                                       +++    disulphide bridges    Ability to reduce correct    ++ +++                                       +++    disulphide bridges    __________________________________________________________________________     Buffer A: 50 mM Tris.HCl pH 8, 0.5 M NaCl     Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl     MSA: 0.2 M Mercaptosuccinic acid     Protein solution: 2 mg/ml hβ.sub.2 m, 50 mM Tris.HCl pH 8, 0.5 M NaC

    __________________________________________________________________________    Composition of samples used in SDS-PAGE of gel not shown    Test no.   1  2  3  4  5  6  7  8  9    __________________________________________________________________________    μl protein solution               36 36 36 36 36 36 36 36 36    μl Buffer A               160                  160                     140                        120                           100                              80 60 40 20    μl Buffer B               0  0  20 40 60 80 100                                    120                                       140    μl AC   0  4  4  4  4  4  4  4  4    M urea     0  0  1  2  3  4  5  6  7    Ability to reduce wrong                  +  ++ ++ +++                              +++                                 +++                                    +++                                       +++    disulphide bridges    Ability to reduce correct                           +  ++ +++                                    +++                                       +++    disulphide bridges    __________________________________________________________________________     Buffer A: 50 mM Tris.HCl pH 8, 0.5 M NaCl     Buffer B: 10 M urea, 50 mM Tris.HCl pH 8, 0.5 M NaCl     AC: 0.2 M Nacetyl-L-cysteine     Protein solution: 2 mg/ml hβ.sub.2 m, 50 mM Tris.HCl pH 8, 0.5 M NaC

The different topological forms of β₂ -m may be separated bynon-reducing SDS-PAGE gel electrophoresis. The fastest migrating bandrepresents the oxidized monomeric form. This band is immediatelyfollowed by the reduced β₂ -m with a slightly slower migration rate,whereas the multimeric forms of the protein are migrating much slower inthe gel.

In this analysis we are probing for the ability of each of the fivereducing agents tested, to reduce the disulphide bridges of multimericforms of β₂ -microglobulin without significantly reducing the correctlyformed disulphide bridge of the monomeric oxidized form.

The results from the analyses (FIGS. 27-30, 31, and data not shown) are,in summary, as follows: N-acetyl-L-cysteine and mercaptosuccinic acidare, under the conditions used, essentially unable to discriminatecorrect and incorrect disulphide bridges. Glutathione, cysteine ethylester and 2-mercaptoethanol are all capable of--within 10 min and withinindividual characteristic ranges of urea concentrations significantlyreducing disulphide bridges of multimeric forms while most of theoxidised monomeric β₂ -m remains in the oxidised form. Glutathione hasclearly the capacity of selectively reducing incorrect disulphidebridges at higher concentrations of urea compared to cysteine ethylester and 2-mercaptoethanol and therefore glutathione among theselection of thiols tested would be the reducing agent of choice forcyclic refolding of human β₂ -microglobulin. As a consequence of theseexperiments the concentration of urea in the reducing buffer B for therefolding procedure used in Example 13 was lowered from 8 M (Example 1)to 6 M, which led to an improvement of overall refolding yield of humanβ₂ -microglobulin from 53% to 87%.

Example 13

Refolding of purified human β₂ -microglobulin: Comparative analysis ofthree refolding procedures

The following set of experiments were undertaken to obtain comparablequantitative data to evaluate the importance of cycling for refoldingyield versus simple refolding procedures involving a stepwise or agradual one-pass transition from strongly denaturing and reducingconditions to non-denaturing and non-reducing conditions.

Purified refolded recombinant human β₂ -microglobulin fusion protein,obtained as described in EXAMPLE 1, was reduced and denatured to obtainstarting materials devoid of impurities, such as proteolytic breakdownproducts or minor fractions of fusion protein damaged by irreversibleoxidation or other chemical derivatization.

In a first step the optimization procedure described in EXAMPLE 12 wasused to modify the conditions for cyclic refolding described in EXAMPLE1 to increase the number of productive cycles. The optimized refoldingprotocol was identical to that described in EXAMPLE 1, as were buffersand other experimental parameters, except that the Buffer B in thepresent experiments was 6 M urea, 50 mM Tris-HCl pH 8, 0.5 M NaCl, 4 mMglutathione.

Three batches of pure fusion protein were refolded while attached toNi⁺⁺ -loaded NTA-agarose as described in EXAMPLE 1, using the presentBuffer B composition. One batch was submitted to buffer cycling asdescribed in EXAMPLE 1, for batch two and three cycling was replaced bya monotonous linear buffer gradient (100% B to 0% B over 24 hours) and astep gradient (100% B to 0% B in one step, followed by 0% B buffer for24 hours), respectively. In each refolding experiment all of thepolypeptide material was recovered as described in EXAMPLE 1 as asoluble fraction elutable under non-denaturing conditions and aremaining insoluble fraction elutable only under denaturing and reducingconditions. The yields of correctly folded fusion protein were thenmeasured by quantitative densitometric analysis (Optical scanner HW andCS 370 Densitometric Analysis SW package from Hoeffer Scientific, Calif.USA) of Coomassie stained SDS-PAGE gels on which suitably dilutedmeasured aliquots of soluble and insoluble fractions had been separatedunder reducing or non-reducing conditions, as required to allowseparation of correctly disulphide-bridged monomer from soluble polymersin soluble fractions. Where required to obtain reliable densitometricdata both for intense and faint bands in a gel lane several sampledilutions were scanned and analyzed to obtain re-scaled data sets.

Experimental details and results

Purified denatured and reduced fusion protein:

A batch of human β₂ -microglobulin fusion protein was refolded asdescribed in EXAMPLE 1. 96% of the fusion protein was recovered in thesoluble fraction (FIG. 32, lanes 2-5). 56% of this soluble fraction wasin the monomeric and disulphide-bridged form. Hence, the overallrefolding efficiency obtained was 53%. Monomeric fusion protein waspurified from multimers by ion exchange chromatography on S-Sepharose(Pharmacia, Sweden): The soluble fraction obtained after refolding wasgel filtered on Sephadex G-25 (Pharmacia, Sweden) into a buffercontaining 5 mM NaCl and 5 mM Tris-HCl pH 8, diluted to double volumewith water and then applied to the S-Sepharose column, which was theneluted using a gradient (5 column volumes from 2.5 mM Tris-HCl pH 8, 2.5mM NaCl to 25 mM Tris-HCl pH 8, 100 mM NaCl). The monomeric correctlyfolded fusion protein purified to >95% purity (FIG. 32, lanes 6 and 7)was then made 6 M in guanidinium hydrochloride and 0.1 M in DTE, gelfiltrated into a buffer containing 8 M urea, 50 mM Tris-HCl pH 8, 1 MNaCl and 10 mM 2-mercaptoethanol and then divided into aliquots to beused as starting material for the refolding experiments described below.

Cyclic refolding or purified fusion protein:

An aliquot of denatured reduced fusion protein was applied to a Ni⁺⁺-loaded NTA column which was then washed with one column volume of abuffer containing 6 M guanidinium hydrochloride, 50 mM Tris-HCl pH 8 and10 mM 2-mercaptoethanol.

The fusion protein was then subjected to buffer cycling according to thescheme shown in Table 1 using Buffer A: 50 mM Tris-HCl pH 8, 0.5 M NaCland 3.2 mM/0.4 mM reduced/oxidized glutathione and Buffer B: 50 mMTris-HCl pH 8, 0.5 M NaCl, 6 M urea and 4 mM reduced glutathione. Aftercompletion of buffer cycling the fusion protein was recoveredquantitatively in a soluble form by elution of the column with a buffercontaining 50 mM Tris-HCl pH 8, 0.5 M NaCl and 20 mM EDTA. 87% wasobtained in the correct monomeric disulphide-bridged form (FIG. 32 lanes8 and 9).

Refolding of purified fusion protein by linear gradient:

An aliquot of denatured reduced fusion protein was applied to a Ni⁺⁺-loaded NTA column which was then washed with one column volume of abuffer containing 6 M guanidinium hydrochloride, 50 mM Tris-HCl pH 8 and10 mM 2-mercaptoethanol followed by 1 column volume of a buffercontaining 50 mM Tris-HCl pH 8, 0.5 M NaCl, 6 M urea and 4 mM reducedglutathione.

A 24 hour linear gradient from 100% B to 100% A was then applied at 2ml/min, using Buffer A: 50 mM Tris-HCl pH 8, 0.5 M NaCl and 3.2 mM/0.4mM reduced/oxidized glutathione and Buffer B: 50 mM Tris-HCl pH 8, 0.5 MNaCl, 6 M urea and 4 mM reduced glutathione. After completion of thegradient the soluble fraction of fusion protein was eluted in a buffercontaining 50 mM Tris-HCl pH 8, 0.5 M NaCl and 20 mM EDTA. The remaininginsoluble fraction was extracted the column in a buffer containing 50 mMTris-HCl pH 8, 1 M NaCl, 8 M urea, 10 mM 2-mercaptoethanol and 20 mMEDTA.

48% of the fusion protein was recovered in the soluble fraction and 60%of the soluble fraction was recovered in the correction monomericdisulphide-bridged form. The overall efficiency of folding obtained wastherefore 29% (FIG. 33, lanes 5-7).

Refolding of purified fusion protein by buffer step:

An aliquot of denatured reduced fusion protein was applied to a Ni⁺⁺-loaded NTA column which was then washed with one column volume of abuffer containing 6 M guanidinium hydrochloride, 50 mM Tris-HCl pH 8 and10 mM 2-mercaptoethanol.

Buffer containing 50 mM Tris-HCl pH 8, 0.5 M NaCl and 3.2 mM/0.4 mMreduced/oxidized glutathione was then applied to the column at 2 ml/minfor 24 hours before recovering the soluble fraction of fusion protein ina buffer containing 50 mM Tris-HCl pH 8, 0.5 M NaCl and 20 mM EDTA. Theremaining insoluble fraction was extracted from column in a buffercontaining 50 mM Tris-HCl pH 8, 1 M NaCl, 8 M urea, 10 mM2-mercaptoethanol and 20 mM EDTA.

34% of the fusion protein was recovered in the soluble fraction and 28%of the soluble fraction was recovered in the correction monomericdisulphide-bridged form. The overall efficiency of folding obtained wastherefore 9.5% (FIG. 33, lanes 1-3).

Conclusions

In summary, using human β₂ -microglobulin as a model protein, it may beconcluded that (a) straightforward buffer optimization and improvedpurification of fusion protein prior to cyclic refolding increasedrefolding yield significantly (from 53% to 87%) and (b) progressivedenaturation--renaturation cycling is superior to single-pass refoldingunder otherwise comparable experimental conditions by a very largefactor (87% versus 29% or 9.5% yields).

REFERENCES:

Christensen, J. H., Hansen, P. K., Lillelund, O., and Th.oslashed.gersen, H. C. (1991). Sequence-specific binding of theN-terminal three-finger fragment of Xenopus transcription factor IIIA tothe internal control region of a 5S RNA gene. FEBS Letters, 295:181-184.

Dalb.o slashed.ge, H., Dahl, H. -H., M., Pedersen, J., Hansen, J., W.,and T., Kristensen (1987). A Novel Enzymatic Method for Production ofAuthentic hGH From an Eschericia coli Produced hGH-Precursor,Bio/Technology, 5: 161-164.

Datar, R., V., Cartwright, T., and C. -G. Rosen (1993). Process Economicof Animal Cell and Bacterial Fermentations: A Case Study Analysis ofTissue Plasminogen Activator. Bio/Technology, 11: 349-357.

Herz, J., Hanmann, U., Rogne, S., Myklebost, O., Gausepohl, H., andStanley, K. K. (1988), Surface location and high affinity for calcium ofa 500 kd liver membrane protein closely related to the LDL-receptorsuggest a physiological role as lipoprotein receptor. EMBO J., 7:4119-4127.

Hoffmann, H. J., Olsen, E., Etzerodt, M., Madsen, P., Th.oslashed.gersen, II. C., Kruse, T., and Celis J. E. (1994). PsoriasinBinds Calcium and Is Differentially Regulated With Respect to OtherMembers of the S100 Protein Family. J. Dermatol. Invest. in press.

Hochuli, E., W. Bannwarth, H. Dobeli, R. Gentz, and D. Stuber. 1988.Genetic approach to facilitate purification of recombinant proteins witha novel metal chelate absorbent. Bio/Technology, 6: 1321-1325.

Holliger., P., Prospero, T., and G. Winter (1993). "Diabodies": Smallbivalent and bispecific antibody fragments. Proc. Natl. Acad. Sci. USA.90: 6444-6448.

Maniatis, T., E. F. Fritsch, and J. Sambrook, 1982. Molecular cloning.Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.

Nagai, K., and H. C. Th.o slashed.gersen, 1987. Synthesis andSequence-Specific Proteolysis of Hybrid Proteins Produced in Escherichiacoli. Methods in Enzymology, 152: 461-481.

Nagai, K., Nakaseko, Y., Nasmyth, K., and Rhodes, D. (1988). Zinc-fingermotifs expressed in E. coli and folded in vitro direct specific bindingto DNA. Nature, 332: 284-286.

Nykjaer A., Petersen C. M., M.o slashed.ller B., Jensen P. H., MoestrupS. K., Holtet T. L., Etzerodt M., Th.o slashed.gersen H. C., Munch M.,Andreasen P. A., and Gliemann J. (1992). Purified α₂ -MacroglobulinReceptor/LDL Receptor-related Protein Binds Urokinase-PlasminogenActivator Inhibitor Type-I Complex. J. Biol. Chem. 267: 14543-14546.

Rathjen, D. et al. (1991), Mol. Immunol. 28, p29.

Rathjen, D. et al. (1992), Brit. J. Cancer 65, 852-856.

Saiki, R. K., Gelfant, D. H., Stoffel, S., Scharf, S. J., Higuchi, R.,Horn, G. T., Mullis, K. B., and Erlich, H. A. (1988). Primer-directedenzymatic amplification of DNA with a thermostable DNA polymerase.Science 239: 487-491.

Studier, F. W. and Moffat, B. A. 1986. Use of Bacteriophage T7 RNAPolymerase to Direct Selective High level Expression of Cloned Genes. J.Mol. Biol., 189: 113-130.

The regents of the University of California. Enterokinase-cleavablelinker sequence. EP

    __________________________________________________________________________    #             SEQUENCE LISTING    - (1) GENERAL INFORMATION:    -    (iii) NUMBER OF SEQUENCES: 58    - (2) INFORMATION FOR SEQ ID NO: 1:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 1554 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: double              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: cDNA    -    (iii) HYPOTHETICAL: YES    -    (iii) ANTI-SENSE: NO    -     (vi) ORIGINAL SOURCE:              (A) ORGANISM: Bos tauru - #s    -     (ix) FEATURE:              (A) NAME/KEY: CDS              (B) LOCATION: 76..1551    #1:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - AGCCTGGGCG AGCGGACCTT GCCCTGGAGG CCTGTTGCGG CAGGGACTCA CG - #GCTGTCCT      60    #GTT CTG CTC AGC ACC     111GC CTG CTG CAT CTC    #Met Ala Gly Leu Leu His Leu Val Leu Leu S - #er Thr    #                10    - GCC CTG GGC GGC CTC CTG CGG CCG GCG GGG AG - #C GTG TTC CTG CCC CGG     159    Ala Leu Gly Gly Leu Leu Arg Pro Ala Gly Se - #r Val Phe Leu Pro Arg    #         25    - GAC CAG GCC CAC CGT GTC CTG CAG AGA GCC CG - #C AGG GCC AAC TCA TTC     207    Asp Gln Ala His Arg Val Leu Gln Arg Ala Ar - #g Arg Ala Asn Ser Phe    #     40    - TTG GAG GAG GTG AAG CAG GGA AAC CTG GAG CG - #A GAG TGC CTG GAG GAG     255    Leu Glu Glu Val Lys Gln Gly Asn Leu Glu Ar - #g Glu Cys Leu Glu Glu    # 60    - GCC TGC TCA CTA GAG GAG GCC CGC GAG GTC TT - #C GAG GAC GCA GAG CAG     303    Ala Cys Ser Leu Glu Glu Ala Arg Glu Val Ph - #e Glu Asp Ala Glu Gln    #                 75    - ACG GAT GAA TTC TGG AGT AAA TAC AAA GAT GG - #A GAC CAG TGT GAA GGC     351    Thr Asp Glu Phe Trp Ser Lys Tyr Lys Asp Gl - #y Asp Gln Cys Glu Gly    #             90    - CAC CCG TGC CTG AAT CAG GGC CAC TGT AAA GA - #C GGC ATC GGA GAC TAC     399    His Pro Cys Leu Asn Gln Gly His Cys Lys As - #p Gly Ile Gly Asp Tyr    #        105    - ACC TGC ACC TGT GCG GAA GGG TTT GAA GGC AA - #A AAC TGC GAG TTC TCC     447    Thr Cys Thr Cys Ala Glu Gly Phe Glu Gly Ly - #s Asn Cys Glu Phe Ser    #   120    - ACG CGT GAG ATC TGC AGC CTG GAC AAT GGA GG - #C TGC GAC CAG TTC TGC     495    Thr Arg Glu Ile Cys Ser Leu Asp Asn Gly Gl - #y Cys Asp Gln Phe Cys    125                 1 - #30                 1 - #35                 1 -    #40    - AGG GAG GAG CGC AGC GAG GTG CGG TGC TCC TG - #C GCG CAC GGC TAC GTG     543    Arg Glu Glu Arg Ser Glu Val Arg Cys Ser Cy - #s Ala His Gly Tyr Val    #               155    - CTG GGC GAC GAC AGC AAG TCC TGC GTG TCC AC - #A GAG CGC TTC CCC TGT     591    Leu Gly Asp Asp Ser Lys Ser Cys Val Ser Th - #r Glu Arg Phe Pro Cys    #           170    - GGG AAG TTC ACG CAG GGA CGC AGC CGG CGG TG - #G GCC ATC CAC ACC AGC     639    Gly Lys Phe Thr Gln Gly Arg Ser Arg Arg Tr - #p Ala Ile His Thr Ser    #       185    - GAG GAC GCG CTT GAC GCC AGC GAG CTG GAG CA - #C TAC GAC CCT GCA GAC     687    Glu Asp Ala Leu Asp Ala Ser Glu Leu Glu Hi - #s Tyr Asp Pro Ala Asp    #   200    - CTG AGC CCC ACA GAG AGC TCC TTG GAC CTG CT - #G GGC CTC AAC AGG ACC     735    Leu Ser Pro Thr Glu Ser Ser Leu Asp Leu Le - #u Gly Leu Asn Arg Thr    205                 2 - #10                 2 - #15                 2 -    #20    - GAG CCC AGC GCC GGG GAG GAC GGC AGC CAG GT - #G GTC CGG ATA GTG GGC     783    Glu Pro Ser Ala Gly Glu Asp Gly Ser Gln Va - #l Val Arg Ile Val Gly    #               235    - GGC AGG GAC TGC GCG GAG GGC GAG TGC CCA TG - #G CAG GCT CTG CTG GTC     831    Gly Arg Asp Cys Ala Glu Gly Glu Cys Pro Tr - #p Gln Ala Leu Leu Val    #           250    - AAC GAA GAG AAC GAG GGA TTC TGC GGG GGC AC - #C ATC CTG AAC GAG TTC     879    Asn Glu Glu Asn Glu Gly Phe Cys Gly Gly Th - #r Ile Leu Asn Glu Phe    #       265    - TAC GTC CTC ACG GCT GCC CAC TGC CTG CAC CA - #G GCC AAG AGG TTC ACG     927    Tyr Val Leu Thr Ala Ala His Cys Leu His Gl - #n Ala Lys Arg Phe Thr    #   280    - GTG AGG GTC GGC GAC CGG AAC ACA GAG CAG GA - #G GAG GGC AAC GAG ATG     975    Val Arg Val Gly Asp Arg Asn Thr Glu Gln Gl - #u Glu Gly Asn Glu Met    285                 2 - #90                 2 - #95                 3 -    #00    - GCA CAC GAG GTG GAG ATG ACT GTG AAG CAC AG - #C CGC TTT GTC AAG GAG    1023    Ala His Glu Val Glu Met Thr Val Lys His Se - #r Arg Phe Val Lys Glu    #               315    - ACC TAC GAC TTC GAC ATC GCG GTG CTG AGG CT - #C AAG ACG CCC ATC CGG    1071    Thr Tyr Asp Phe Asp Ile Ala Val Leu Arg Le - #u Lys Thr Pro Ile Arg    #           330    - TTC CGC CGG AAC GTG GCG CCC GCC TGC CTG CC - #C GAG AAG GAC TGG GCG    1119    Phe Arg Arg Asn Val Ala Pro Ala Cys Leu Pr - #o Glu Lys Asp Trp Ala    #       345    - GAG GCC ACG CTG ATG ACC CAG AAG ACG GGC AT - #C GTC AGC GGC TTC GGG    1167    Glu Ala Thr Leu Met Thr Gln Lys Thr Gly Il - #e Val Ser Gly Phe Gly    #   360    - CGC ACG CAC GAG AAG GGC CGC CTG TCG TCC AC - #G CTC AAG ATG CTG GAG    1215    Arg Thr His Glu Lys Gly Arg Leu Ser Ser Th - #r Leu Lys Met Leu Glu    365                 3 - #70                 3 - #75                 3 -    #80    - GTG CCC TAC GTG GAC CGC AGC ACC TGT AAG CT - #G TCC AGC AGC TTC ACC    1263    Val Pro Tyr Val Asp Arg Ser Thr Cys Lys Le - #u Ser Ser Ser Phe Thr    #               395    - ATT ACG CCC AAC ATG TTC TGC GCC GGC TAC GA - #C ACC CAG CCC GAG GAC    1311    Ile Thr Pro Asn Met Phe Cys Ala Gly Tyr As - #p Thr Gln Pro Glu Asp    #           410    - GCC TGC CAG GGC GAC AGT GGC GGC CCC CAC GT - #C ACC CGC TTC AAG GAC    1359    Ala Cys Gln Gly Asp Ser Gly Gly Pro His Va - #l Thr Arg Phe Lys Asp    #       425    - ACC TAC TTC GTC ACA GGC ATC GTC AGC TGG GG - #A GAA GGG TGC GCG CGC    1407    Thr Tyr Phe Val Thr Gly Ile Val Ser Trp Gl - #y Glu Gly Cys Ala Arg    #   440    - AAG GGC AAG TTC GGC GTC TAC ACC AAG GTC TC - #C AAC TTC CTC AAG TGG    1455    Lys Gly Lys Phe Gly Val Tyr Thr Lys Val Se - #r Asn Phe Leu Lys Trp    445                 4 - #50                 4 - #55                 4 -    #60    - ATC GAC AAG ATC ATG AAG GCC AGG GCA GGG GC - #C GCG GGC AGC CGC GGC    1503    Ile Asp Lys Ile Met Lys Ala Arg Ala Gly Al - #a Ala Gly Ser Arg Gly    #               475    - CAC AGT GAA GCC CCT GCC ACC TGG ACG GTC CC - #G CCG CCC CTC CCC CTC    1551    His Ser Glu Ala Pro Ala Thr Trp Thr Val Pr - #o Pro Pro Leu Pro Leu    #           490    #           1554    - (2) INFORMATION FOR SEQ ID NO: 2:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 492 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #2:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Ala Gly Leu Leu His Leu Val Leu Leu Se - #r Thr Ala Leu Gly Gly    #                 15    - Leu Leu Arg Pro Ala Gly Ser Val Phe Leu Pr - #o Arg Asp Gln Ala His    #             30    - Arg Val Leu Gln Arg Ala Arg Arg Ala Asn Se - #r Phe Leu Glu Glu Val    #         45    - Lys Gln Gly Asn Leu Glu Arg Glu Cys Leu Gl - #u Glu Ala Cys Ser Leu    #     60    - Glu Glu Ala Arg Glu Val Phe Glu Asp Ala Gl - #u Gln Thr Asp Glu Phe    # 80    - Trp Ser Lys Tyr Lys Asp Gly Asp Gln Cys Gl - #u Gly His Pro Cys Leu    #                 95    - Asn Gln Gly His Cys Lys Asp Gly Ile Gly As - #p Tyr Thr Cys Thr Cys    #           110    - Ala Glu Gly Phe Glu Gly Lys Asn Cys Glu Ph - #e Ser Thr Arg Glu Ile    #       125    - Cys Ser Leu Asp Asn Gly Gly Cys Asp Gln Ph - #e Cys Arg Glu Glu Arg    #   140    - Ser Glu Val Arg Cys Ser Cys Ala His Gly Ty - #r Val Leu Gly Asp Asp    145                 1 - #50                 1 - #55                 1 -    #60    - Ser Lys Ser Cys Val Ser Thr Glu Arg Phe Pr - #o Cys Gly Lys Phe Thr    #               175    - Gln Gly Arg Ser Arg Arg Trp Ala Ile His Th - #r Ser Glu Asp Ala Leu    #           190    - Asp Ala Ser Glu Leu Glu His Tyr Asp Pro Al - #a Asp Leu Ser Pro Thr    #       205    - Glu Ser Ser Leu Asp Leu Leu Gly Leu Asn Ar - #g Thr Glu Pro Ser Ala    #   220    - Gly Glu Asp Gly Ser Gln Val Val Arg Ile Va - #l Gly Gly Arg Asp Cys    225                 2 - #30                 2 - #35                 2 -    #40    - Ala Glu Gly Glu Cys Pro Trp Gln Ala Leu Le - #u Val Asn Glu Glu Asn    #               255    - Glu Gly Phe Cys Gly Gly Thr Ile Leu Asn Gl - #u Phe Tyr Val Leu Thr    #           270    - Ala Ala His Cys Leu His Gln Ala Lys Arg Ph - #e Thr Val Arg Val Gly    #       285    - Asp Arg Asn Thr Glu Gln Glu Glu Gly Asn Gl - #u Met Ala His Glu Val    #   300    - Glu Met Thr Val Lys His Ser Arg Phe Val Ly - #s Glu Thr Tyr Asp Phe    305                 3 - #10                 3 - #15                 3 -    #20    - Asp Ile Ala Val Leu Arg Leu Lys Thr Pro Il - #e Arg Phe Arg Arg Asn    #               335    - Val Ala Pro Ala Cys Leu Pro Glu Lys Asp Tr - #p Ala Glu Ala Thr Leu    #           350    - Met Thr Gln Lys Thr Gly Ile Val Ser Gly Ph - #e Gly Arg Thr His Glu    #       365    - Lys Gly Arg Leu Ser Ser Thr Leu Lys Met Le - #u Glu Val Pro Tyr Val    #   380    - Asp Arg Ser Thr Cys Lys Leu Ser Ser Ser Ph - #e Thr Ile Thr Pro Asn    385                 3 - #90                 3 - #95                 4 -    #00    - Met Phe Cys Ala Gly Tyr Asp Thr Gln Pro Gl - #u Asp Ala Cys Gln Gly    #               415    - Asp Ser Gly Gly Pro His Val Thr Arg Phe Ly - #s Asp Thr Tyr Phe Val    #           430    - Thr Gly Ile Val Ser Trp Gly Glu Gly Cys Al - #a Arg Lys Gly Lys Phe    #       445    - Gly Val Tyr Thr Lys Val Ser Asn Phe Leu Ly - #s Trp Ile Asp Lys Ile    #   460    - Met Lys Ala Arg Ala Gly Ala Ala Gly Ser Ar - #g Gly His Ser Glu Ala    465                 4 - #70                 4 - #75                 4 -    #80    - Pro Ala Thr Trp Thr Val Pro Pro Pro Leu Pr - #o Leu    #               490    - (2) INFORMATION FOR SEQ ID NO: 3:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 42 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #3:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #  42              AGGG TAGAATCCAG CGTACTCCAA AG    - (2) INFORMATION FOR SEQ ID NO: 4:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 23 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #4:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #                23TGTC TCG    - (2) INFORMATION FOR SEQ ID NO: 5:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 44 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #5:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    # 44               AGGG TAGAATCCAG AAAACCCCTC AAAT    - (2) INFORMATION FOR SEQ ID NO: 6:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 22 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #6:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #                 22CGA TC    - (2) INFORMATION FOR SEQ ID NO: 7:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 40 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #7:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #    40            GTAG GTTCCCAACC ATTCCCTTAT    - (2) INFORMATION FOR SEQ ID NO: 8:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 26 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #8:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #              26  ACAG CTGCCC    - (2) INFORMATION FOR SEQ ID NO: 9:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 39 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #9:   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #    39            AGGG TAGGTACTCG CGGGAGAAG    - (2) INFORMATION FOR SEQ ID NO: 10:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 26 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #10:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #              26  GTTC GTTGTG    - (2) INFORMATION FOR SEQ ID NO: 11:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 42 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #11:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #  42              AGGG TAGGGCTATC GACGCCCCTA AG    - (2) INFORMATION FOR SEQ ID NO: 12:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 30 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #12:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #           30     GCAG TGGGGCCCCT    - (2) INFORMATION FOR SEQ ID NO: 13:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 29 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #13:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #            29    CTTG CAGGAGCGG    - (2) INFORMATION FOR SEQ ID NO: 14:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 32 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #14:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #          32      CTTG CATGACTTCC CG    - (2) INFORMATION FOR SEQ ID NO: 15:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 42 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #15:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #  42              AGGG TAGGGGCACC AACAAATGCC GG    - (2) INFORMATION FOR SEQ ID NO: 16:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 29 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #16:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #            29    CAGG CTGCGGCAG    - (2) INFORMATION FOR SEQ ID NO: 17:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 41 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #17:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #   41             AGGG TAGGGTGCCT CCACCCCAGT G    - (2) INFORMATION FOR SEQ ID NO: 18:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 29 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #18:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #            29    GTCG CAGAGCTCG    - (2) INFORMATION FOR SEQ ID NO: 19:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 46 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #19:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #                 46TAG GGGTGGTCAG TGCTCTCTGA ATAACG    - (2) INFORMATION FOR SEQ ID NO: 20:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 29 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #20:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #            29    CTCA TAGCAGGTG    - (2) INFORMATION FOR SEQ ID NO: 21:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 44 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #21:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    # 44               AGGG TAGGGCGGTG AATTCCTCTT GCCG    - (2) INFORMATION FOR SEQ ID NO: 22:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 30 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #22:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #           30     GTGG CAGCCACGCT    - (2) INFORMATION FOR SEQ ID NO: 23:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 42 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #23:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #  42              AGGG TAGGGTGTCC AACTGCACGG CT    - (2) INFORMATION FOR SEQ ID NO: 24:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 30 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #24:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #           30     GCTG CAGTCCTCCT    - (2) INFORMATION FOR SEQ ID NO: 25:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 47 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #25:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #                47AGGG TAGGAGTAAA TACAAAGATG GAGACCA    - (2) INFORMATION FOR SEQ ID NO: 26:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 30 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #26:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #           30     GGTG GCAGGGGCTT    - (2) INFORMATION FOR SEQ ID NO: 27:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 62 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #27:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - CTGCCTGGAT CCATCGAGGG TAGGAAAGTG TATCTCTCAT CAGAGTGCAA GA - #CTGGGAAT      60    #              62    - (2) INFORMATION FOR SEQ ID NO: 28:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 33 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #28:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #         33       ACAC TCAAGAATGT CGC    - (2) INFORMATION FOR SEQ ID NO: 29:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 42 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #29:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #  42              AGGG TAGGGTCCAG GACTGCTACC AT    - (2) INFORMATION FOR SEQ ID NO: 30:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 31 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #30:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #          31      TTCT GTTCCTGAGC A    - (2) INFORMATION FOR SEQ ID NO: 31:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 40 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #31:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #    40            GTAG GGTCTACCTC CAGACATCCT    - (2) INFORMATION FOR SEQ ID NO: 32:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 26 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #32:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #              26  TTCC AAGATC    - (2) INFORMATION FOR SEQ ID NO: 33:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 39 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #33:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #    39            GTAG GGGCGAGCCA CCAACCCAG    - (2) INFORMATION FOR SEQ ID NO: 34:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 25 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #34:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #               25 CCCG AACTG    - (2) INFORMATION FOR SEQ ID NO: 35:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 38 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #35:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #     38           GTAG GCAGGTCAAA CTGCAGCA    - (2) INFORMATION FOR SEQ ID NO: 36:    -      (i) SEQUENCE CHARACTERISTICS:    #pairs    (A) LENGTH: 29 base              (B) TYPE: nucleic acid              (C) STRANDEDNESS: single              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: DNA (synthetic)    #36:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    #            29    ATCC TCTTCTGAG    - (2) INFORMATION FOR SEQ ID NO: 37:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 6 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #37:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    -      Gly Ser Ile Glu Gly Arg    #  5 1    - (2) INFORMATION FOR SEQ ID NO: 38:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #38:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ile Glu Gly Arg    - (2) INFORMATION FOR SEQ ID NO: 39:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #39:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Tyr Trp Thr Asp    1    - (2) INFORMATION FOR SEQ ID NO: 40:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #40:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ile Gln Gly Arg    1    - (2) INFORMATION FOR SEQ ID NO: 41:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #41:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ala Glu Gly Arg    1    - (2) INFORMATION FOR SEQ ID NO: 42:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #42:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ala Gln Gly Arg    1    - (2) INFORMATION FOR SEQ ID NO: 43:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #43:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ile Cys Gly Arg    1    - (2) INFORMATION FOR SEQ ID NO: 44:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #44:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ala Cys Gly Arg    1    - (2) INFORMATION FOR SEQ ID NO: 45:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #45:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ile Met Gly Arg    1    - (2) INFORMATION FOR SEQ ID NO: 46:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #46:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Ala Met Gly Arg    1    - (2) INFORMATION FOR SEQ ID NO: 47:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 6 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #47:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - His His His His His His    1               5    - (2) INFORMATION FOR SEQ ID NO: 48:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 15 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: peptide    #48:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Gly Ser His His His His His His Gly Se - #r Ile Glu Gly Arg    #                15    - (2) INFORMATION FOR SEQ ID NO: 49:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 119 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #49:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Ser Arg Ser Val Ala Leu Ala Val Leu Al - #a Leu Leu Ser Leu Ser    #                15    - Gly Leu Glu Ala Ile Gln Arg Thr Pro Lys Il - #e Gln Val Tyr Ser Arg    #            30    - His Pro Ala Glu Asn Gly Lys Ser Asn Phe Le - #u Asn Cys Tyr Val Ser    #        45    - Gly Phe His Pro Ser Asp Ile Glu Val Asp Le - #u Leu Lys Asn Gly Glu    #    60    - Arg Ile Glu Lys Val Glu His Ser Asp Leu Se - #r Phe Ser Lys Asp Trp    #80    - Ser Phe Tyr Leu Leu Tyr Tyr Thr Glu Phe Th - #r Pro Thr Glu Lys Asp    #                95    - Glu Tyr Ala Cys Arg Val Asn His Val Thr Le - #u Ser Gln Pro Lys Ile    #           110    - Val Lys Trp Asp Arg Asp Met            115    - (2) INFORMATION FOR SEQ ID NO: 50:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 119 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #50:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Ala Arg Ser Val Thr Leu Val Phe Leu Va - #l Leu Val Ser Leu Thr    #                15    - Gly Leu Tyr Ala Ile Gln Lys Thr Pro Gln Il - #e Gln Val Tyr Ser Arg    #            30    - His Pro Pro Glu Asn Gly Lys Pro Asn Ile Le - #u Asn Cys Tyr Val Thr    #        45    - Gln Phe His Pro Pro His Ile Glu Ile Gln Me - #t Leu Lys Asn Gly Lys    #    60    - Lys Ile Pro Lys Val Glu Met Ser Asp Met Se - #r Phe Ser Lys Asp Trp    #80    - Ser Phe Tyr Ile Leu Ala His Thr Glu Phe Th - #r Pro Thr Glu Thr Asp    #                95    - Thr Tyr Ala Cys Arg Val Lys His Asp Ser Me - #t Ala Glu Pro Lys Thr    #           110    - Val Tyr Trp Asp Arg Asp Met            115    - (2) INFORMATION FOR SEQ ID NO: 51:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 217 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #51:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Ala Thr Gly Ser Arg Thr Ser Leu Leu Le - #u Ala Phe Gly Leu Leu    #                15    - Cys Leu Pro Trp Leu Gln Glu Gly Ser Ala Ph - #e Pro Thr Ile Pro Leu    #            30    - Ser Arg Leu Phe Asp Asn Ala Ser Leu Arg Al - #a His Arg Leu His Gln    #        45    - Leu Ala Phe Asp Thr Tyr Gln Glu Phe Glu Gl - #u Ala Tyr Ile Pro Lys    #    60    - Glu Gln Lys Tyr Ser Phe Leu Gln Asn Pro Gl - #n Thr Ser Leu Cys Phe    #80    - Ser Glu Ser Ile Pro Thr Pro Ser Asn Arg Gl - #u Glu Thr Gln Gln Lys    #                95    - Ser Asn Leu Glu Leu Leu Arg Ile Ser Leu Le - #u Leu Ile Gln Ser Trp    #           110    - Leu Glu Pro Val Gln Phe Leu Arg Ser Val Ph - #e Ala Asn Ser Leu Val    #       125    - Tyr Gly Ala Ser Asp Ser Asn Val Tyr Asp Le - #u Leu Lys Asp Leu Glu    #   140    - Glu Gly Ile Gln Thr Leu Met Gly Arg Leu Gl - #u Asp Gly Ser Pro Arg    145                 1 - #50                 1 - #55                 1 -    #60    - Thr Gly Gln Ile Phe Lys Gln Thr Tyr Ser Ly - #s Phe Asp Thr Asn Ser    #               175    - His Asn Asp Asp Ala Leu Leu Lys Asn Tyr Gl - #y Leu Leu Tyr Cys Phe    #           190    - Arg Lys Asp Met Asp Lys Val Glu Thr Phe Le - #u Arg Ile Val Gln Cys    #       205    - Arg Ser Val Glu Gly Ser Cys Gly Phe    #   215    - (2) INFORMATION FOR SEQ ID NO: 52:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 4544 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #52:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Leu Thr Pro Pro Leu Leu Leu Leu Leu Pr - #o Leu Leu Ser Ala Leu    #                15    - Val Ala Ala Ala Ile Asp Ala Pro Lys Thr Cy - #s Ser Pro Lys Gln Phe    #            30    - Ala Cys Arg Asp Gln Ile Thr Cys Ile Ser Ly - #s Gly Trp Arg Cys Asp    #        45    - Gly Glu Arg Asp Cys Pro Asp Gly Ser Asp Gl - #u Ala Pro Glu Ile Cys    #    60    - Pro Gln Ser Lys Ala Gln Arg Cys Gln Pro As - #n Glu His Asn Cys Leu    #80    - Gly Thr Glu Leu Cys Val Pro Met Ser Arg Le - #u Cys Asn Gly Val Gln    #                95    - Asp Cys Met Asp Gly Ser Asp Glu Gly Pro Hi - #s Cys Arg Glu Leu Gln    #           110    - Gly Asn Cys Ser Arg Leu Gly Cys Gln His Hi - #s Cys Val Pro Thr Leu    #       125    - Asp Gly Pro Thr Cys Tyr Cys Asn Ser Ser Ph - #e Gln Leu Gln Ala Asp    #   140    - Gly Lys Thr Cys Lys Asp Phe Asp Glu Cys Se - #r Val Tyr Gly Thr Cys    145                 1 - #50                 1 - #55                 1 -    #60    - Ser Gln Leu Cys Thr Asn Thr Asp Gly Ser Ph - #e Ile Cys Gly Cys Val    #               175    - Glu Gly Tyr Leu Leu Gln Pro Asp Asn Arg Se - #r Cys Lys Ala Lys Asn    #           190    - Glu Pro Val Asp Arg Pro Pro Val Leu Leu Il - #e Ala Asn Ser Gln Asn    #       205    - Ile Leu Ala Thr Tyr Leu Ser Gly Ala Gln Va - #l Ser Thr Ile Thr Pro    #   220    - Thr Ser Thr Arg Gln Thr Thr Ala Met Asp Ph - #e Ser Tyr Ala Asn Glu    225                 2 - #30                 2 - #35                 2 -    #40    - Thr Val Cys Trp Val His Val Gly Asp Ser Al - #a Ala Gln Thr Gln Leu    #               255    - Lys Cys Ala Arg Met Pro Gly Leu Lys Gly Ph - #e Val Asp Glu His Thr    #           270    - Ile Asn Ile Ser Leu Ser Leu His His Val Gl - #u Gln Met Ala Ile Asp    #       285    - Trp Leu Thr Gly Asn Phe Tyr Phe Val Asp As - #p Ile Asp Asp Arg Ile    #   300    - Phe Val Cys Asn Arg Asn Gly Asp Thr Cys Va - #l Thr Leu Leu Asp Leu    305                 3 - #10                 3 - #15                 3 -    #20    - Glu Leu Tyr Asn Pro Lys Gly Ile Ala Leu As - #p Pro Ala Met Gly Lys    #               335    - Val Phe Phe Thr Asp Tyr Gly Gln Ile Pro Ly - #s Val Glu Arg Cys Asp    #           350    - Met Asp Gly Gln Asn Arg Thr Lys Leu Val As - #p Ser Lys Ile Val Phe    #       365    - Pro His Gly Ile Thr Leu Asp Leu Val Ser Ar - #g Leu Val Tyr Trp Ala    #   380    - Asp Ala Tyr Leu Asp Tyr Ile Glu Val Val As - #p Tyr Glu Gly Lys Gly    385                 3 - #90                 3 - #95                 4 -    #00    - Arg Gln Thr Ile Ile Gln Gly Ile Leu Ile Gl - #u His Leu Tyr Gly Leu    #               415    - Thr Val Phe Glu Asn Tyr Leu Tyr Ala Thr As - #n Ser Asp Asn Ala Asn    #           430    - Ala Gln Gln Lys Thr Ser Val Ile Arg Val As - #n Arg Phe Asn Ser Thr    #       445    - Glu Tyr Gln Val Val Thr Arg Val Asp Lys Gl - #y Gly Ala Leu His Ile    #   460    - Tyr His Gln Arg Arg Gln Pro Arg Val Arg Se - #r His Ala Cys Glu Asn    465                 4 - #70                 4 - #75                 4 -    #80    - Asp Gln Tyr Gly Lys Pro Gly Gly Cys Ser As - #p Ile Cys Leu Leu Ala    #               495    - Asn Ser His Lys Ala Arg Thr Cys Arg Cys Ar - #g Ser Gly Phe Ser Leu    #           510    - Gly Ser Asp Gly Lys Ser Cys Lys Lys Pro Gl - #u His Glu Leu Phe Leu    #       525    - Val Tyr Gly Lys Gly Arg Pro Gly Ile Ile Ar - #g Gly Met Asp Met Gly    #   540    - Ala Lys Val Pro Asp Glu His Met Ile Pro Il - #e Glu Asn Leu Met Asn    545                 5 - #50                 5 - #55                 5 -    #60    - Pro Arg Ala Leu Asp Phe His Ala Glu Thr Gl - #y Phe Ile Tyr Phe Ala    #               575    - Asp Thr Thr Ser Tyr Leu Ile Gly Arg Gln Ly - #s Ile Asp Gly Thr Glu    #           590    - Arg Glu Thr Ile Leu Lys Asp Gly Ile His As - #n Val Glu Gly Val Ala    #       605    - Val Asp Trp Met Gly Asp Asn Leu Tyr Trp Th - #r Asp Asp Gly Pro Lys    #   620    - Lys Thr Ile Ser Val Ala Arg Leu Glu Lys Al - #a Ala Gln Thr Arg Lys    625                 6 - #30                 6 - #35                 6 -    #40    - Thr Leu Ile Glu Gly Lys Met Thr His Pro Ar - #g Ala Ile Val Val Asp    #               655    - Pro Leu Asn Gly Trp Met Tyr Trp Thr Asp Tr - #p Glu Glu Asp Pro Lys    #           670    - Asp Ser Arg Arg Gly Arg Leu Glu Arg Ala Tr - #p Met Asp Gly Ser His    #       685    - Arg Asp Ile Phe Val Thr Ser Lys Thr Val Le - #u Trp Pro Asn Gly Leu    #   700    - Ser Leu Asp Ile Pro Ala Gly Arg Leu Tyr Tr - #p Val Asp Ala Phe Tyr    705                 7 - #10                 7 - #15                 7 -    #20    - Asp Arg Ile Glu Thr Ile Leu Leu Asn Gly Th - #r Asp Arg Lys Ile Val    #               735    - Tyr Glu Gly Pro Glu Leu Asn His Ala Phe Gl - #y Leu Cys His His Gly    #           750    - Asn Tyr Leu Phe Trp Thr Glu Tyr Arg Ser Gl - #y Ser Val Tyr Arg Leu    #       765    - Glu Arg Gly Val Gly Gly Ala Pro Pro Thr Va - #l Thr Leu Leu Arg Ser    #   780    - Glu Arg Pro Pro Ile Phe Glu Ile Arg Met Ty - #r Asp Ala Gln Gln Gln    785                 7 - #90                 7 - #95                 8 -    #00    - Gln Val Gly Thr Asn Lys Cys Arg Val Asn As - #n Gly Gly Cys Ser Ser    #               815    - Leu Cys Leu Ala Thr Pro Gly Ser Arg Gln Cy - #s Ala Cys Ala Glu Asp    #           830    - Gln Val Leu Asp Ala Asp Gly Val Thr Cys Le - #u Ala Asn Pro Ser Tyr    #       845    - Val Pro Pro Pro Gln Cys Gln Pro Gly Glu Ph - #e Ala Cys Ala Asn Ser    #   860    - Arg Cys Ile Gln Glu Arg Trp Lys Cys Asp Gl - #y Asp Asn Asp Cys Leu    865                 8 - #70                 8 - #75                 8 -    #80    - Asp Asn Ser Asp Glu Ala Pro Ala Leu Cys Hi - #s Gln His Thr Cys Pro    #               895    - Ser Asp Arg Phe Lys Cys Glu Asn Asn Arg Cy - #s Ile Pro Asn Arg Trp    #           910    - Leu Cys Asp Gly Asp Asn Asp Cys Gly Asn Se - #r Glu Asp Glu Ser Asn    #       925    - Ala Thr Cys Ser Ala Arg Thr Cys Pro Pro As - #n Gln Phe Ser Cys Ala    #   940    - Ser Gly Arg Cys Ile Pro Ile Ser Trp Thr Cy - #s Asp Leu Asp Asp Asp    945                 9 - #50                 9 - #55                 9 -    #60    - Cys Gly Asp Arg Ser Asp Glu Ser Ala Ser Cy - #s Ala Tyr Pro Thr Cys    #               975    - Phe Pro Leu Thr Gln Phe Thr Cys Asn Asn Gl - #y Arg Cys Ile Asn Ile    #           990    - Asn Trp Arg Cys Asp Asn Asp Asn Asp Cys Gl - #y Asp Asn Ser Asp Glu    #      10050    - Ala Gly Cys Ser His Ser Cys Ser Ser Thr Gl - #n Phe Lys Cys Asn Ser    #  10205    - Gly Arg Cys Ile Pro Glu His Trp Thr Cys As - #p Gly Asp Asn Asp Cys    #               10401030 - #                1035    - Gly Asp Tyr Ser Asp Glu Thr His Ala Asn Cy - #s Thr Asn Gln Ala Thr    #              10550    - Arg Pro Pro Gly Gly Cys His Thr Asp Glu Ph - #e Gln Cys Arg Leu Asp    #          10705    - Gly Leu Cys Ile Pro Leu Arg Trp Arg Cys As - #p Gly Asp Thr Asp Cys    #      10850    - Met Asp Ser Ser Asp Glu Lys Ser Cys Glu Gl - #y Val Thr His Val Cys    #  11005    - Asp Pro Ser Val Lys Phe Gly Cys Lys Asp Se - #r Ala Arg Cys Ile Ser    #               11201110 - #                1115    - Lys Ala Trp Val Cys Asp Gly Asp Asn Asp Cy - #s Glu Asp Asn Ser Asp    #              11350    - Glu Glu Asn Cys Glu Ser Leu Ala Cys Arg Pr - #o Pro Ser His Pro Cys    #          11505    - Ala Asn Asn Thr Ser Val Cys Leu Pro Pro As - #p Lys Leu Cys Asp Gly    #      11650    - Asn Asp Asp Cys Gly Asp Gly Ser Asp Glu Gl - #y Glu Leu Cys Asp Gln    #  11805    - Cys Ser Leu Asn Asn Gly Gly Cys Ser His As - #n Cys Ser Val Ala Pro    #               12001190 - #                1195    - Gly Glu Gly Ile Val Cys Ser Cys Pro Leu Gl - #y Met Glu Leu Gly Pro    #              12150    - Asp Asn His Thr Cys Gln Ile Gln Ser Tyr Cy - #s Ala Lys His Leu Lys    #          12305    - Cys Ser Gln Lys Cys Asp Gln Asn Lys Phe Se - #r Val Lys Cys Ser Cys    #      12450    - Tyr Glu Gly Trp Val Leu Glu Pro Asp Gly Gl - #u Ser Cys Arg Ser Leu    #  12605    - Asp Pro Phe Lys Pro Phe Ile Ile Phe Ser As - #n Arg His Glu Ile Arg    #               12801270 - #                1275    - Arg Ile Asp Leu His Lys Gly Asp Tyr Ser Va - #l Leu Val Pro Gly Leu    #              12950    - Arg Asn Thr Ile Ala Leu Asp Phe His Leu Se - #r Gln Ser Ala Leu Tyr    #          13105    - Trp Thr Asp Val Val Glu Asp Lys Ile Tyr Ar - #g Gly Lys Leu Leu Asp    #      13250    - Asn Gly Ala Leu Thr Ser Phe Glu Val Val Il - #e Gln Tyr Gly Leu Ala    #  13405    - Thr Pro Glu Gly Leu Ala Val Asp Trp Ile Al - #a Gly Asn Ile Tyr Trp    #               13601350 - #                1355    - Val Glu Ser Asn Leu Asp Gln Ile Glu Val Al - #a Lys Leu Asp Gly Thr    #              13750    - Leu Arg Thr Thr Leu Leu Ala Gly Asp Ile Gl - #u His Pro Arg Ala Ile    #          13905    - Ala Leu Asp Pro Arg Asp Gly Ile Leu Phe Tr - #p Thr Asp Trp Asp Ala    #      14050    - Ser Leu Pro Arg Ile Glu Ala Ala Ser Met Se - #r Gly Ala Gly Arg Arg    #  14205    - Thr Val His Arg Glu Thr Gly Ser Gly Gly Tr - #p Pro Asn Gly Leu Thr    #               14401430 - #                1435    - Val Asp Tyr Leu Glu Lys Arg Ile Leu Trp Il - #e Asp Ala Arg Ser Asp    #              14550    - Ala Ile Tyr Ser Ala Arg Tyr Asp Gly Ser Gl - #y His Met Glu Val Leu    #          14705    - Arg Gly His Glu Phe Leu Ser His Pro Phe Al - #a Val Thr Leu Tyr Gly    #      14850    - Gly Glu Val Tyr Trp Thr Asp Trp Arg Thr As - #n Thr Leu Ala Lys Ala    #  15005    - Asn Lys Trp Thr Gly His Asn Val Thr Val Va - #l Gln Arg Thr Asn Thr    #               15201510 - #                1515    - Gln Pro Phe Asp Leu Gln Val Tyr His Pro Se - #r Arg Gln Pro Met Ala    #              15350    - Pro Asn Pro Cys Glu Ala Asn Gly Gly Gln Gl - #y Pro Cys Ser His Leu    #          15505    - Cys Leu Ile Asn Tyr Asn Arg Thr Val Ser Cy - #s Ala Cys Pro His Leu    #      15650    - Met Lys Leu His Lys Asp Asn Thr Thr Cys Ty - #r Glu Phe Lys Lys Phe    #  15805    - Leu Leu Tyr Ala Arg Gln Met Glu Ile Arg Gl - #y Val Asp Leu Asp Ala    #               16001590 - #                1595    - Pro Tyr Tyr Asn Tyr Ile Ile Ser Phe Thr Va - #l Pro Asp Ile Asp Asn    #              16150    - Val Thr Val Leu Asp Tyr Asp Ala Arg Glu Gl - #n Arg Val Tyr Trp Ser    #          16305    - Asp Val Arg Thr Gln Ala Ile Lys Arg Ala Ph - #e Ile Asn Gly Thr Gly    #      16450    - Val Glu Thr Val Val Ser Ala Asp Leu Pro As - #n Ala His Gly Leu Ala    #  16605    - Val Asp Trp Val Ser Arg Asn Leu Phe Trp Th - #r Ser Tyr Asp Thr Asn    #               16801670 - #                1675    - Lys Lys Gln Ile Asn Val Ala Arg Leu Asp Gl - #y Ser Phe Lys Asn Ala    #              16950    - Val Val Gln Gly Leu Glu Gln Pro His Gly Le - #u Val Val His Pro Leu    #          17105    - Arg Gly Lys Leu Tyr Trp Thr Asp Gly Asp As - #n Ile Ser Met Ala Asn    #      17250    - Met Asp Gly Ser Asn Arg Thr Leu Leu Phe Se - #r Gly Gln Lys Gly Pro    #  17405    - Val Gly Leu Ala Ile Asp Phe Pro Glu Ser Ly - #s Leu Tyr Trp Ile Ser    #               17601750 - #                1755    - Ser Gly Asn His Thr Ile Asn Arg Cys Asn Le - #u Asp Gly Ser Gly Leu    #              17750    - Glu Val Ile Asp Ala Met Arg Ser Gln Leu Gl - #y Lys Ala Thr Ala Leu    #          17905    - Ala Ile Met Gly Asp Lys Leu Trp Trp Ala As - #p Gln Val Ser Glu Lys    #      18050    - Met Gly Thr Cys Ser Lys Ala Asp Gly Ser Gl - #y Ser Val Val Leu Arg    #  18205    - Asn Ser Thr Thr Leu Val Met His Met Lys Va - #l Tyr Asp Glu Ser Ile    #               18401830 - #                1835    - Gln Leu Asp His Lys Gly Thr Asn Pro Cys Se - #r Val Asn Asn Gly Asp    #              18550    - Cys Ser Gln Leu Cys Leu Pro Thr Ser Glu Th - #r Thr Arg Ser Cys Met    #          18705    - Cys Thr Ala Gly Tyr Ser Leu Arg Ser Gly Gl - #n Gln Ala Cys Glu Gly    #      18850    - Val Gly Ser Phe Leu Leu Tyr Ser Val His Gl - #u Gly Ile Arg Gly Ile    #  19005    - Pro Leu Asp Pro Asn Asp Lys Ser Asp Ala Le - #u Val Pro Val Ser Gly    #               19201910 - #                1915    - Thr Ser Leu Ala Val Gly Ile Asp Phe His Al - #a Glu Asn Asp Thr Ile    #              19350    - Tyr Trp Val Asp Met Gly Leu Ser Thr Ile Se - #r Arg Ala Lys Arg Asp    #          19505    - Gln Thr Trp Arg Glu Asp Val Val Thr Asn Gl - #y Ile Gly Arg Val Glu    #      19650    - Gly Ile Ala Val Asp Trp Ile Ala Gly Asn Il - #e Tyr Trp Thr Asp Gln    #  19805    - Gly Phe Asp Val Ile Glu Val Ala Arg Leu As - #n Gly Ser Phe Arg Tyr    #               20001990 - #                1995    - Val Val Ile Ser Gln Gly Leu Asp Lys Pro Ar - #g Ala Ile Thr Val His    #              20150    - Pro Glu Lys Gly Tyr Leu Phe Trp Thr Glu Tr - #p Gly Gln Tyr Pro Arg    #          20305    - Ile Glu Arg Ser Arg Leu Asp Gly Thr Glu Ar - #g Val Val Leu Val Asn    #      20450    - Val Ser Ile Ser Trp Pro Asn Gly Ile Ser Va - #l Asp Tyr Gln Asp Gly    #  20605    - Lys Leu Tyr Trp Cys Asp Ala Arg Thr Asp Ly - #s Ile Glu Arg Ile Asp    #               20802070 - #                2075    - Leu Glu Thr Gly Glu Asn Arg Glu Val Val Le - #u Ser Ser Asn Asn Met    #              20950    - Asp Met Phe Ser Val Ser Val Phe Glu Asp Ph - #e Ile Tyr Trp Ser Asp    #          21105    - Arg Thr His Ala Asn Gly Ser Ile Lys Arg Gl - #y Ser Lys Asp Asn Ala    #      21250    - Thr Asp Ser Val Pro Leu Arg Thr Gly Ile Gl - #y Val Gln Leu Lys Asp    #  21405    - Ile Lys Val Phe Asn Arg Asp Arg Gln Lys Gl - #y Thr Asn Val Cys Ala    #               21602150 - #                2155    - Val Ala Asn Gly Gly Cys Gln Gln Leu Cys Le - #u Tyr Arg Gly Arg Gly    #              21750    - Gln Arg Ala Cys Ala Cys Ala His Gly Met Le - #u Ala Glu Asp Gly Ala    #          21905    - Ser Cys Arg Glu Tyr Ala Gly Tyr Leu Leu Ty - #r Ser Glu Arg Thr Ile    #      22050    - Leu Lys Ser Ile His Leu Ser Asp Glu Arg As - #n Leu Asn Ala Pro Val    #  22205    - Gln Pro Phe Glu Asp Pro Glu His Met Lys As - #n Val Ile Ala Leu Ala    #               22402230 - #                2235    - Phe Asp Tyr Arg Ala Gly Thr Ser Pro Gly Th - #r Pro Asn Arg Ile Phe    #              22550    - Phe Ser Asp Ile His Phe Gly Asn Ile Gln Gl - #n Ile Asn Asp Asp Gly    #          22705    - Ser Arg Arg Ile Thr Ile Val Glu Asn Val Gl - #y Ser Val Glu Gly Leu    #      22850    - Ala Tyr His Arg Gly Trp Asp Thr Leu Tyr Tr - #p Thr Ser Tyr Thr Thr    #  23005    - Ser Thr Ile Thr Arg His Thr Val Asp Gln Th - #r Arg Pro Gly Ala Phe    #               23202310 - #                2315    - Glu Arg Glu Thr Val Ile Thr Met Ser Gly As - #p Asp His Pro Arg Ala    #              23350    - Phe Val Leu Asp Glu Cys Gln Asn Leu Met Ph - #e Trp Thr Asn Trp Asn    #          23505    - Glu Gln His Pro Ser Ile Met Arg Ala Ala Le - #u Ser Gly Ala Asn Val    #      23650    - Leu Thr Leu Ile Glu Lys Asp Ile Arg Thr Pr - #o Asn Gly Leu Ala Ile    #  23805    - Asp His Arg Ala Glu Lys Leu Tyr Phe Ser As - #p Ala Thr Leu Asp Lys    #               24002390 - #                2395    - Ile Glu Arg Cys Glu Tyr Asp Gly Ser His Ar - #g Tyr Val Ile Leu Lys    #              24150    - Ser Glu Pro Val His Pro Phe Gly Leu Ala Va - #l Tyr Gly Glu His Ile    #          24305    - Phe Trp Thr Asp Trp Val Arg Arg Ala Val Gl - #n Arg Ala Asn Lys His    #      24450    - Val Gly Ser Asn Met Lys Leu Leu Arg Val As - #p Ile Pro Gln Gln Pro    #  24605    - Met Gly Ile Ile Ala Val Ala Asn Asp Thr As - #n Ser Cys Glu Leu Ser    #               24802470 - #                2475    - Pro Cys Arg Ile Asn Asn Gly Gly Cys Gln As - #p Leu Cys Leu Leu Thr    #              24950    - His Gln Gly His Val Asn Cys Ser Cys Arg Gl - #y Gly Arg Ile Leu Gln    #          25105    - Asp Asp Leu Thr Cys Arg Ala Val Asn Ser Se - #r Cys Arg Ala Gln Asp    #      25250    - Glu Phe Glu Cys Ala Asn Gly Glu Cys Ile As - #n Phe Ser Leu Thr Cys    #  25405    - Asp Gly Val Pro His Cys Lys Asp Lys Ser As - #p Glu Lys Pro Ser Tyr    #               25602550 - #                2555    - Cys Asn Ser Arg Arg Cys Lys Lys Thr Phe Ar - #g Gln Cys Ser Asn Gly    #              25750    - Arg Cys Val Ser Asn Met Leu Trp Cys Asn Gl - #y Ala Asp Asp Cys Gly    #          25905    - Asp Gly Ser Asp Glu Ile Pro Cys Asn Lys Th - #r Ala Cys Gly Val Gly    #      26050    - Glu Phe Arg Cys Arg Asp Gly Thr Cys Ile Gl - #y Asn Ser Ser Arg Cys    #  26205    - Asn Gln Phe Val Asp Cys Glu Asp Ala Ser As - #p Glu Met Asn Cys Ser    #               26402630 - #                2635    - Ala Thr Asp Cys Ser Ser Tyr Phe Arg Leu Gl - #y Val Lys Gly Val Leu    #              26550    - Phe Gln Pro Cys Glu Arg Thr Ser Leu Cys Ty - #r Ala Pro Ser Trp Val    #          26705    - Cys Asp Gly Ala Asn Asp Cys Gly Asp Tyr Se - #r Asp Glu Arg Asp Cys    #      26850    - Pro Gly Val Lys Arg Pro Arg Cys Pro Leu As - #n Tyr Phe Ala Cys Pro    #  27005    - Ser Gly Arg Cys Ile Pro Met Ser Trp Thr Cy - #s Asp Lys Glu Asp Asp    #               27202710 - #                2715    - Cys Glu His Gly Glu Asp Glu Thr His Cys As - #n Lys Phe Cys Ser Glu    #              27350    - Ala Gln Phe Glu Cys Gln Asn His Arg Cys Il - #e Ser Lys Gln Trp Leu    #          27505    - Cys Asp Gly Ser Asp Asp Cys Gly Asp Gly Se - #r Asp Glu Ala Ala His    #      27650    - Cys Glu Gly Lys Thr Cys Gly Pro Ser Ser Ph - #e Ser Cys Pro Gly Thr    #  27805    - His Val Cys Val Pro Glu Arg Trp Leu Cys As - #p Gly Asp Lys Asp Cys    #               28002790 - #                2795    - Ala Asp Gly Ala Asp Glu Ser Ile Ala Ala Gl - #y Cys Leu Tyr Asn Ser    #              28150    - Thr Cys Asp Asp Arg Glu Phe Met Cys Gln As - #n Arg Gln Cys Ile Pro    #          28305    - Lys His Phe Val Cys Asp His Asp Arg Asp Cy - #s Ala Asp Gly Ser Asp    #      28450    - Glu Ser Pro Glu Cys Glu Tyr Pro Thr Cys Gl - #y Pro Ser Glu Phe Arg    #  28605    - Cys Ala Asn Gly Arg Cys Leu Ser Ser Arg Gl - #n Trp Glu Cys Asp Gly    #               28802870 - #                2875    - Glu Asn Asp Cys His Asp Gln Ser Asp Glu Al - #a Pro Lys Asn Pro His    #              28950    - Cys Thr Ser Pro Glu His Lys Cys Asn Ala Se - #r Ser Gln Phe Leu Cys    #          29105    - Ser Ser Gly Arg Cys Val Ala Glu Ala Leu Le - #u Cys Asn Gly Gln Asp    #      29250    - Asp Cys Gly Asp Ser Ser Asp Glu Arg Gly Cy - #s His Ile Asn Glu Cys    #  29405    - Leu Ser Arg Lys Leu Ser Gly Cys Ser Gln As - #p Cys Glu Asp Leu Lys    #               29602950 - #                2955    - Ile Gly Phe Lys Cys Arg Cys Arg Pro Gly Ph - #e Arg Leu Lys Asp Asp    #              29750    - Gly Arg Thr Cys Ala Asp Val Asp Glu Cys Se - #r Thr Thr Phe Pro Cys    #          29905    - Ser Gln Arg Cys Ile Asn Thr His Gly Ser Ty - #r Lys Cys Leu Cys Val    #      30050    - Glu Gly Tyr Ala Pro Arg Gly Gly Asp Pro Hi - #s Ser Cys Lys Ala Val    #  30205    - Thr Asp Glu Glu Pro Phe Leu Ile Phe Ala As - #n Arg Tyr Tyr Leu Arg    #               30403030 - #                3035    - Lys Leu Asn Leu Asp Gly Ser Asn Tyr Thr Le - #u Leu Lys Gln Gly Leu    #              30550    - Asn Asn Ala Val Ala Leu Asp Phe Asp Tyr Ar - #g Glu Gln Met Ile Tyr    #          30705    - Trp Thr Asp Val Thr Thr Gln Gly Ser Met Il - #e Arg Arg Met His Leu    #      30850    - Asn Gly Ser Asn Val Gln Val Leu His Arg Th - #r Gly Leu Ser Asn Pro    #  31005    - Asp Gly Leu Ala Val Asp Trp Val Gly Gly As - #n Leu Tyr Trp Cys Asp    #               31203110 - #                3115    - Lys Gly Arg Asp Thr Ile Glu Val Ser Lys Le - #u Asn Gly Ala Tyr Arg    #              31350    - Thr Val Leu Val Ser Ser Gly Leu Arg Glu Pr - #o Arg Ala Leu Val Val    #          31505    - Asp Val Gln Asn Gly Tyr Leu Tyr Trp Thr As - #p Trp Gly Asp His Ser    #      31650    - Leu Ile Gly Arg Ile Gly Met Asp Gly Ser Se - #r Arg Ser Val Ile Val    #  31805    - Asp Thr Lys Ile Thr Trp Pro Asn Gly Leu Th - #r Leu Asp Tyr Val Thr    #               32003190 - #                3195    - Glu Arg Ile Tyr Trp Ala Asp Ala Arg Glu As - #p Tyr Ile Glu Phe Ala    #              32150    - Ser Leu Asp Gly Ser Asn Arg His Val Val Le - #u Ser Gln Asp Ile Pro    #          32305    - His Ile Phe Ala Leu Thr Leu Phe Glu Asp Ty - #r Val Tyr Trp Thr Asp    #      32450    - Trp Glu Thr Lys Ser Ile Asn Arg Ala His Ly - #s Thr Thr Gly Thr Asn    #  32605    - Lys Thr Leu Leu Ile Ser Thr Leu His Arg Pr - #o Met Asp Leu His Val    #               32803270 - #                3275    - Phe His Ala Leu Arg Gln Pro Asp Val Pro As - #n His Pro Cys Lys Val    #              32950    - Asn Asn Gly Gly Cys Ser Asn Leu Cys Leu Le - #u Ser Pro Gly Gly Gly    #          33105    - His Lys Cys Ala Cys Pro Thr Asn Phe Tyr Le - #u Gly Ser Asp Gly Arg    #      33250    - Thr Cys Val Ser Asn Cys Thr Ala Ser Gln Ph - #e Val Cys Lys Asn Asp    #  33405    - Lys Cys Ile Pro Phe Trp Trp Lys Cys Asp Th - #r Glu Asp Asp Cys Gly    #               33603350 - #                3355    - Asp His Ser Asp Glu Pro Pro Asp Cys Pro Gl - #u Phe Lys Cys Arg Pro    #              33750    - Gly Gln Phe Gln Cys Ser Thr Gly Ile Cys Th - #r Asn Pro Ala Phe Ile    #          33905    - Cys Asp Gly Asp Asn Asp Cys Gln Asp Asn Se - #r Asp Glu Ala Asn Cys    #      34050    - Asp Ile His Val Cys Leu Pro Ser Gln Phe Ly - #s Cys Thr Asn Thr Asn    #  34205    - Arg Cys Ile Pro Gly Ile Phe Arg Cys Asn Gl - #y Gln Asp Asn Cys Gly    #               34403430 - #                3435    - Asp Gly Glu Asp Glu Arg Asp Cys Pro Glu Va - #l Thr Cys Ala Pro Asn    #              34550    - Gln Phe Gln Cys Ser Ile Thr Lys Arg Cys Il - #e Pro Arg Val Trp Val    #          34705    - Cys Asp Arg Asp Asn Asp Cys Val Asp Gly Se - #r Asp Glu Pro Ala Asn    #      34850    - Cys Thr Gln Met Thr Cys Gly Val Asp Glu Ph - #e Arg Cys Lys Asp Ser    #  35005    - Gly Arg Cys Ile Pro Ala Arg Trp Lys Cys As - #p Gly Glu Asp Asp Cys    #               35203510 - #                3515    - Gly Asp Gly Ser Asp Glu Pro Lys Glu Glu Cy - #s Asp Glu Arg Thr Cys    #              35350    - Glu Pro Tyr Gln Phe Arg Cys Lys Asn Asn Ar - #g Cys Val Pro Gly Arg    #          35505    - Trp Gln Cys Asp Tyr Asp Asn Asp Cys Gly As - #p Asn Ser Asp Glu Glu    #      35650    - Ser Cys Thr Pro Arg Pro Cys Ser Glu Ser Gl - #u Phe Ser Cys Ala Asn    #  35805    - Gly Arg Cys Ile Ala Gly Arg Trp Lys Cys As - #p Gly Asp His Asp Cys    #               36003590 - #                3595    - Ala Asp Gly Ser Asp Glu Lys Asp Cys Thr Pr - #o Arg Cys Asp Met Asp    #              36150    - Gln Phe Gln Cys Lys Ser Gly His Cys Ile Pr - #o Leu Arg Trp Arg Cys    #          36305    - Asp Ala Asp Ala Asp Cys Met Asp Gly Ser As - #p Glu Glu Ala Cys Gly    #      36450    - Thr Gly Val Arg Thr Cys Pro Leu Asp Glu Ph - #e Gln Cys Asn Asn Thr    #  36605    - Leu Cys Lys Pro Leu Ala Trp Lys Cys Asp Gl - #y Glu Asp Asp Cys Gly    #               36803670 - #                3675    - Asp Asn Ser Asp Glu Asn Pro Glu Glu Cys Al - #a Arg Phe Val Cys Pro    #              36950    - Pro Asn Arg Pro Phe Arg Cys Lys Asn Asp Ar - #g Val Cys Leu Trp Ile    #          37105    - Gly Arg Gln Cys Asp Gly Thr Asp Asn Cys Gl - #y Asp Gly Thr Asp Glu    #      37250    - Glu Asp Cys Glu Pro Pro Thr Ala His Thr Th - #r His Cys Lys Asp Lys    #  37405    - Lys Glu Phe Leu Cys Arg Asn Gln Arg Cys Le - #u Ser Ser Ser Leu Arg    #               37603750 - #                3755    - Cys Asn Met Phe Asp Asp Cys Gly Asp Gly Se - #r Asp Glu Glu Asp Cys    #              37750    - Ser Ile Asp Pro Lys Leu Thr Ser Cys Ala Th - #r Asn Ala Ser Ile Cys    #          37905    - Gly Asp Glu Ala Arg Cys Val Arg Thr Glu Ly - #s Ala Ala Tyr Cys Ala    #      38050    - Cys Arg Ser Gly Phe His Thr Val Pro Gly Gl - #n Pro Gly Cys Gln Asp    #  38205    - Ile Asn Glu Cys Leu Arg Phe Gly Thr Cys Se - #r Gln Leu Cys Asn Asn    #               38403830 - #                3835    - Thr Lys Gly Gly His Leu Cys Ser Cys Ala Ar - #g Asn Phe Met Lys Thr    #              38550    - His Asn Thr Cys Lys Ala Glu Gly Ser Glu Ty - #r Gln Val Leu Tyr Ile    #          38705    - Ala Asp Asp Asn Glu Ile Arg Ser Leu Phe Pr - #o Gly His Pro His Ser    #      38850    - Ala Tyr Glu Gln Ala Phe Gln Gly Asp Glu Se - #r Val Arg Ile Asp Ala    #  39005    - Met Asp Val His Val Lys Ala Gly Arg Val Ty - #r Trp Thr Asn Trp His    #               39203910 - #                3915    - Thr Gly Thr Ile Ser Tyr Arg Ser Leu Pro Pr - #o Ala Ala Pro Pro Thr    #              39350    - Thr Ser Asn Arg His Arg Arg Gln Ile Asp Ar - #g Gly Val Thr His Leu    #          39505    - Asn Ile Ser Gly Leu Lys Met Pro Arg Gly Il - #e Ala Ile Asp Trp Val    #      39650    - Ala Gly Asn Val Tyr Trp Thr Asp Ser Gly Ar - #g Asp Val Ile Glu Val    #  39805    - Ala Gln Met Lys Gly Glu Asn Arg Lys Thr Le - #u Ile Ser Gly Met Ile    #               40003990 - #                3995    - Asp Glu Pro His Ala Ile Val Val Asp Pro Le - #u Arg Gly Thr Met Tyr    #              40150    - Trp Ser Asp Trp Gly Asn His Pro Lys Ile Gl - #u Thr Ala Ala Met Asp    #          40305    - Gly Thr Leu Arg Glu Thr Leu Val Gln Asp As - #n Ile Gln Trp Pro Thr    #      40450    - Gly Leu Ala Val Asp Tyr His Asn Glu Arg Le - #u Tyr Trp Ala Asp Ala    #  40605    - Lys Leu Ser Val Ile Gly Ser Ile Arg Leu As - #n Gly Thr Asp Pro Ile    #               40804070 - #                4075    - Val Ala Ala Asp Ser Lys Arg Gly Leu Ser Hi - #s Pro Phe Ser Ile Asp    #              40950    - Val Phe Glu Asp Tyr Ile Tyr Gly Val Thr Ty - #r Ile Asn Asn Arg Val    #          41105    - Phe Lys Ile His Lys Phe Gly His Ser Pro Le - #u Val Asn Leu Thr Gly    #      41250    - Gly Leu Ser His Ala Ser Asp Val Val Leu Ty - #r His Gln His Lys Gln    #  41405    - Pro Glu Val Thr Asn Pro Cys Asp Arg Lys Ly - #s Cys Glu Trp Leu Cys    #               41604150 - #                4155    - Leu Leu Ser Pro Ser Gly Pro Val Cys Thr Cy - #s Pro Asn Gly Lys Arg    #              41750    - Leu Asp Asn Gly Thr Cys Val Pro Val Pro Se - #r Pro Thr Pro Pro Pro    #          41905    - Asp Ala Pro Arg Pro Gly Thr Cys Asn Leu Gl - #n Cys Phe Asn Gly Gly    #      42050    - Ser Cys Phe Leu Asn Ala Arg Arg Gln Pro Ly - #s Cys Arg Cys Gln Pro    #  42205    - Arg Tyr Thr Gly Asp Lys Cys Glu Leu Asp Gl - #n Cys Trp Glu His Cys    #               42404230 - #                4235    - Arg Asn Gly Gly Thr Cys Ala Ala Ser Pro Se - #r Gly Met Pro Thr Cys    #              42550    - Arg Cys Pro Thr Gly Phe Thr Gly Pro Lys Cy - #s Thr Gln Gln Val Cys    #          42705    - Ala Gly Tyr Cys Ala Asn Asn Ser Thr Cys Th - #r Val Asn Gln Gly Asn    #      42850    - Gln Pro Gln Cys Arg Cys Leu Pro Gly Phe Le - #u Gly Asp Arg Cys Gln    #  43005    - Tyr Arg Gln Cys Ser Gly Tyr Cys Glu Asn Ph - #e Gly Thr Cys Gln Met    #               43204310 - #                4315    - Ala Ala Asp Gly Ser Arg Gln Cys Arg Cys Th - #r Ala Tyr Phe Glu Gly    #              43350    - Ser Arg Cys Glu Val Asn Lys Cys Ser Arg Cy - #s Leu Glu Gly Ala Cys    #          43505    - Val Val Asn Lys Gln Ser Gly Asp Val Thr Cy - #s Asn Cys Thr Asp Gly    #      43650    - Arg Val Ala Pro Ser Cys Leu Thr Cys Val Gl - #y His Cys Ser Asn Gly    #  43805    - Gly Ser Cys Thr Met Asn Ser Lys Met Met Pr - #o Glu Cys Gln Cys Pro    #               44004390 - #                4395    - Pro His Met Thr Gly Pro Arg Cys Glu Glu Hi - #s Val Phe Ser Gln Gln    #              44150    - Gln Pro Gly His Ile Ala Ser Ile Leu Ile Pr - #o Leu Leu Leu Leu Leu    #          44305    - Leu Leu Val Leu Val Ala Gly Val Val Phe Tr - #p Tyr Lys Arg Arg Val    #      44450    - Gln Gly Ala Lys Gly Phe Gln His Gln Arg Me - #t Thr Asn Gly Ala Met    #  44605    - Asn Val Glu Ile Gly Asn Pro Thr Tyr Lys Me - #t Tyr Glu Gly Gly Glu    #               44804470 - #                4475    - Pro Asp Asp Val Gly Gly Leu Leu Asp Ala As - #p Phe Ala Leu Asp Pro    #              44950    - Asp Lys Pro Thr Asn Phe Thr Asn Pro Val Ty - #r Ala Thr Leu Tyr Met    #          45105    - Gly Gly His Gly Ser Arg His Ser Leu Ala Se - #r Thr Asp Glu Lys Arg    #      45250    - Glu Leu Leu Gly Arg Gly Pro Glu Asp Glu Il - #e Gly Asp Pro Leu Ala    #  45405    - (2) INFORMATION FOR SEQ ID NO: 53:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 487 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #53:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Ala Gly Leu Leu His Leu Val Leu Leu Se - #r Thr Ala Leu Gly Gly    #                15    - Leu Leu Arg Pro Ala Gly Ser Val Phe Leu Pr - #o Arg Asp Gln Ala His    #            30    - Arg Val Leu Gln Arg Ala Arg Arg Ala Asn Se - #r Phe Leu Glu Glu Val    #        45    - Lys Gln Gly Asn Leu Glu Arg Glu Cys Leu Gl - #u Glu Ala Cys Ser Leu    #    60    - Glu Glu Ala Arg Glu Val Phe Glu Asp Ala Gl - #u Gln Thr Asp Glu Phe    #80    - Trp Ser Lys Tyr Lys Asp Gly Asp Gln Cys Gl - #u Gly His Pro Cys Leu    #                95    - Asn Gln Gly His Cys Lys Asp Gly Ile Gly As - #p Tyr Thr Cys Thr Cys    #           110    - Ala Glu Gly Phe Glu Gly Lys Asn Cys Glu Ph - #e Ser Thr Arg Glu Ile    #       125    - Cys Ser Leu Asp Asn Gly Gly Cys Asp Gln Ph - #e Cys Arg Glu Glu Arg    #   140    - Ser Glu Val Arg Cys Ser Cys Ala His Gly Ty - #r Val Leu Gly Asp Asp    145                 1 - #50                 1 - #55                 1 -    #60    - Ser Lys Ser Cys Val Ser Thr Glu Arg Phe Pr - #o Cys Gly Lys Phe Thr    #               175    - Gln Gly Arg Ser Arg Arg Trp Ala Ile His Th - #r Ser Glu Asp Ala Leu    #           190    - Asp Ala Ser Glu Leu Glu His Tyr Asp Pro Al - #a Asp Leu Ser Pro Thr    #       205    - Glu Ser Ser Leu Asp Leu Leu Gly Leu Asn Ar - #g Thr Glu Pro Ser Ala    #   220    - Gly Glu Asp Gly Ser Gln Val Val Arg Ile Va - #l Gly Gly Arg Asp Cys    225                 2 - #30                 2 - #35                 2 -    #40    - Ala Glu Gly Glu Cys Pro Trp Gln Ala Leu Le - #u Val Asn Glu Glu Asn    #               255    - Glu Gly Phe Cys Gly Gly Thr Ile Leu Asn Gl - #u Phe Tyr Val Leu Thr    #           270    - Ala Ala His Cys Leu His Gln Ala Lys Arg Ph - #e Thr Val Arg Val Gly    #       285    - Asp Arg Asn Thr Glu Gln Glu Glu Gly Asn Gl - #u Met Ala His Glu Val    #   300    - Glu Met Thr Val Lys His Ser Arg Phe Val Ly - #s Glu Thr Tyr Asp Phe    305                 3 - #10                 3 - #15                 3 -    #20    - Asp Ile Ala Val Leu Arg Leu Lys Thr Pro Il - #e Arg Phe Arg Arg Asn    #               335    - Val Ala Pro Ala Cys Leu Pro Glu Lys Asp Tr - #p Ala Glu Ala Thr Leu    #           350    - Met Thr Gln Lys Thr Gly Ile Val Ser Gly Ph - #e Gly Arg Thr His Glu    #       365    - Lys Gly Arg Leu Ser Ser Thr Leu Lys Met Le - #u Glu Val Pro Tyr Val    #   380    - Asp Arg Ser Thr Cys Lys Leu Ser Ser Ser Ph - #e Thr Ile Thr Pro Asn    385                 3 - #90                 3 - #95                 4 -    #00    - Met Phe Cys Ala Gly Tyr Asp Thr Gln Pro Gl - #u Asp Ala Cys Gln Gly    #               415    - Asp Ser Gly Gly Pro His Val Thr Arg Phe Ly - #s Asp Thr Tyr Phe Val    #           430    - Thr Gly Ile Val Ser Trp Gly Glu Gly Cys Al - #a Arg Lys Gly Lys Phe    #       445    - Gly Val Tyr Thr Lys Val Ser Asn Phe Leu Ly - #s Trp Ile Asp Lys Ile    #   460    - Met Lys Ala Arg Ala Gly Ala Ala Gly Ser Ar - #g Gly His Ser Glu Ala    465                 4 - #70                 4 - #75                 4 -    #80    - Pro Ala Thr Trp Thr Val Pro                    485    - (2) INFORMATION FOR SEQ ID NO: 54:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 790 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #54:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Glu Pro Leu Asp Asp Tyr Val Asn Thr Gln Gl - #y Ala Ser Leu Phe Ser    #                15    - Val Thr Lys Lys Gln Leu Gly Ala Gly Ser Il - #e Glu Glu Cys Ala Ala    #            30    - Lys Cys Glu Glu Asp Glu Glu Phe Thr Cys Ar - #g Ala Phe Gln Tyr His    #        45    - Ser Lys Glu Gln Gln Cys Val Ile Met Ala Gl - #u Asn Arg Lys Ser Ser    #    60    - Ile Ile Arg Met Arg Asp Val Val Leu Phe Gl - #u Lys Lys Val Tyr Leu    #80    - Ser Glu Cys Lys Thr Gly Asn Gly Lys Asn Ty - #r Arg Gly Thr Met Ser    #                95    - Lys Thr Lys Asn Gly Ile Thr Cys Gln Lys Tr - #p Ser Ser Thr Ser Pro    #           110    - His Arg Pro Arg Phe Ser Pro Ala Thr His Pr - #o Ser Glu Gly Leu Glu    #       125    - Glu Asn Tyr Cys Arg Asn Pro Asp Asn Asp Pr - #o Gln Gly Pro Trp Cys    #   140    - Tyr Thr Thr Asp Pro Glu Lys Arg Tyr Asp Ty - #r Cys Asp Ile Leu Glu    145                 1 - #50                 1 - #55                 1 -    #60    - Cys Glu Glu Glu Cys Met His Cys Ser Gly Gl - #u Asn Tyr Asp Gly Lys    #               175    - Ile Ser Lys Thr Met Ser Gly Leu Glu Cys Gl - #n Ala Trp Asp Ser Gln    #           190    - Ser Pro His Ala His Gly Tyr Ile Pro Ser Ly - #s Phe Pro Asn Lys Asn    #       205    - Leu Lys Lys Asn Tyr Cys Arg Asn Pro Asp Ar - #g Glu Leu Arg Pro Trp    #   220    - Cys Phe Thr Thr Asp Pro Asn Lys Arg Trp Gl - #u Leu Cys Asp Ile Pro    225                 2 - #30                 2 - #35                 2 -    #40    - Arg Cys Thr Thr Pro Pro Pro Ser Ser Gly Pr - #o Thr Tyr Gln Cys Leu    #               255    - Lys Gly Thr Gly Glu Asn Tyr Arg Gly Asn Va - #l Ala Val Thr Val Ser    #           270    - Gly His Thr Cys Gln His Trp Ser Ala Gln Th - #r Pro His Thr His Asn    #       285    - Arg Thr Pro Glu Asn Phe Pro Cys Lys Asn Le - #u Asp Glu Asn Tyr Cys    #   300    - Arg Asn Pro Asp Gly Lys Arg Ala Pro Trp Cy - #s His Thr Thr Asn Ser    305                 3 - #10                 3 - #15                 3 -    #20    - Gln Val Arg Trp Glu Tyr Cys Lys Ile Pro Se - #r Cys Asp Ser Ser Pro    #               335    - Val Ser Thr Glu Glu Leu Ala Pro Thr Ala Pr - #o Pro Glu Leu Thr Pro    #           350    - Val Val Gln Asp Cys Tyr His Gly Asp Gly Gl - #n Ser Tyr Arg Gly Thr    #       365    - Ser Ser Thr Thr Thr Thr Gly Lys Lys Cys Gl - #n Ser Trp Ser Ser Met    #   380    - Thr Pro His Arg His Gln Lys Thr Pro Glu As - #n Tyr Pro Asn Ala Gly    385                 3 - #90                 3 - #95                 4 -    #00    - Leu Thr Met Asn Tyr Cys Arg Asn Pro Asp Al - #a Asp Lys Gly Pro Trp    #               415    - Cys Phe Thr Thr Asp Pro Ser Val Arg Trp Gl - #u Tyr Cys Asn Leu Lys    #           430    - Lys Cys Ser Gly Thr Glu Ala Ser Val Val Al - #a Pro Pro Pro Val Val    #       445    - Leu Leu Pro Asn Val Glu Thr Pro Ser Glu Gl - #u Asp Cys Met Phe Gly    #   460    - Asn Gly Lys Gly Tyr Arg Gly Lys Arg Ala Th - #r Thr Val Thr Gly Thr    465                 4 - #70                 4 - #75                 4 -    #80    - Pro Cys Gln Asp Trp Ala Ala Gln Glu Pro Hi - #s Arg His Ser Ile Phe    #               495    - Thr Pro Glu Thr Asn Pro Arg Ala Gly Leu Gl - #u Lys Asn Tyr Cys Arg    #           510    - Asn Pro Asp Gly Asp Val Gly Gly Pro Trp Cy - #s Tyr Thr Thr Asn Pro    #       525    - Arg Lys Leu Tyr Asp Tyr Cys Asp Val Pro Gl - #n Cys Ala Ala Pro Ser    #   540    - Phe Asp Cys Gly Lys Pro Gln Val Glu Pro Ly - #s Lys Cys Pro Gly Arg    545                 5 - #50                 5 - #55                 5 -    #60    - Val Val Gly Gly Cys Val Ala His Pro His Se - #r Trp Pro Trp Gln Val    #               575    - Ser Leu Arg Thr Arg Phe Gly Met His Phe Cy - #s Gly Gly Thr Leu Ile    #           590    - Ser Pro Glu Trp Val Leu Thr Ala Ala His Cy - #s Leu Glu Lys Ser Pro    #       605    - Arg Pro Ser Ser Tyr Lys Val Ile Leu Gly Al - #a His Gln Glu Val Asn    #   620    - Leu Glu Pro His Val Gln Glu Ile Glu Val Se - #r Arg Leu Phe Leu Glu    625                 6 - #30                 6 - #35                 6 -    #40    - Pro Thr Arg Lys Asp Ile Ala Leu Leu Lys Le - #u Ser Ser Pro Ala Val    #               655    - Ile Thr Asp Lys Val Ile Pro Ala Cys Leu Pr - #o Ser Pro Asn Tyr Val    #           670    - Val Ala Asp Arg Thr Glu Cys Phe Ile Thr Gl - #y Trp Gly Glu Thr Gln    #       685    - Gly Thr Phe Gly Ala Gly Leu Leu Lys Glu Al - #a Gln Leu Pro Val Ile    #   700    - Glu Asn Lys Val Cys Asn Arg Tyr Glu Phe Le - #u Asn Gly Arg Val Gln    705                 7 - #10                 7 - #15                 7 -    #20    - Ser Thr Glu Leu Cys Ala Gly His Leu Ala Gl - #y Gly Thr Asp Ser Cys    #               735    - Gln Gly Asp Ser Gly Gly Pro Leu Val Cys Ph - #e Glu Lys Asp Lys Tyr    #           750    - Ile Leu Gln Gly Val Thr Ser Trp Gly Leu Gl - #y Cys Ala Arg Pro Asn    #       765    - Lys Pro Gly Val Tyr Val Arg Val Ser Arg Ph - #e Val Thr Trp Ile Glu    #   780    - Gly Val Met Arg Asn Asn    785                 7 - #90    - (2) INFORMATION FOR SEQ ID NO: 55:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 153 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #55:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Val Tyr Leu Gln Thr Ser Leu Lys Tyr Asn Il - #e Leu Pro Glu Lys Glu    #                15    - Glu Phe Pro Phe Ala Leu Gly Val Gln Thr Le - #u Pro Gln Thr Cys Asp    #            30    - Glu Pro Lys Ala His Thr Ser Phe Gln Ile Se - #r Leu Ser Val Ser Tyr    #        45    - Thr Gly Ser Arg Ser Ala Ser Asn Met Ala Il - #e Val Asp Val Lys Met    #    60    - Val Ser Gly Phe Ile Pro Leu Lys Pro Thr Va - #l Lys Met Leu Glu Arg    #80    - Ser Asn His Val Ser Arg Thr Glu Val Ser Se - #r Asn His Val Leu Ile    #                95    - Tyr Leu Asp Lys Val Ser Asn Gln Thr Leu Se - #r Leu Phe Phe Thr Val    #           110    - Leu Gln Asp Val Pro Val Arg Asp Leu Lys Pr - #o Ala Ile Val Lys Val    #       125    - Tyr Asp Tyr Tyr Glu Thr Asp Glu Phe Ala Il - #e Ala Glu Tyr Asn Ala    #   140    - Pro Cys Ser Lys Asp Leu Gly Asn Ala    145                 1 - #50    - (2) INFORMATION FOR SEQ ID NO: 56:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 202 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #56:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Glu Leu Trp Gly Ala Tyr Leu Leu Leu Cy - #s Leu Phe Ser Leu Leu    #                15    - Thr Gln Val Thr Thr Glu Pro Pro Thr Gln Ly - #s Pro Lys Lys Ile Val    #            30    - Asn Ala Lys Lys Asp Val Val Asn Thr Lys Me - #t Phe Glu Glu Leu Lys    #        45    - Ser Arg Leu Asp Thr Leu Ala Gln Glu Val Al - #a Leu Leu Lys Glu Gln    #    60    - Gln Ala Leu Gln Thr Val Cys Leu Lys Gly Th - #r Lys Val His Met Lys    #80    - Cys Phe Leu Ala Phe Thr Gln Thr Lys Thr Ph - #e His Glu Ala Ser Glu    #                95    - Asp Cys Ile Ser Arg Gly Gly Thr Leu Ser Th - #r Pro Gln Thr Gly Ser    #           110    - Glu Asn Asp Ala Leu Tyr Glu Tyr Leu Arg Gl - #n Ser Val Gly Asn Glu    #       125    - Ala Glu Ile Trp Leu Gly Leu Asn Asp Met Al - #a Ala Glu Gly Thr Trp    #   140    - Val Asp Met Thr Gly Ala Arg Ile Ala Tyr Ly - #s Asn Trp Glu Thr Glu    145                 1 - #50                 1 - #55                 1 -    #60    - Ile Thr Ala Gln Pro Asp Gly Gly Lys Thr Gl - #u Asn Cys Ala Val Leu    #               175    - Ser Gly Ala Ala Asn Gly Lys Trp Phe Asp Ly - #s Arg Cys Arg Asp Gln    #           190    - Leu Pro Tyr Ile Cys Gln Phe Gly Ile Val    #       200    - (2) INFORMATION FOR SEQ ID NO: 57:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 246 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #57:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Gln Val Lys Leu Gln Gln Ser Gly Ala Glu Le - #u Val Lys Pro Gly Ala    #                15    - Ser Val Lys Met Ser Cys Lys Ala Ser Gly Ty - #r Thr Phe Ala Ser Tyr    #            30    - Trp Ile Asn Trp Val Lys Gln Arg Pro Gly Gl - #n Gly Leu Glu Trp Ile    #        45    - Gly His Ile Tyr Pro Val Arg Ser Ile Thr Ly - #s Tyr Asn Glu Lys Phe    #    60    - Lys Ser Lys Ala Thr Leu Thr Leu Asp Thr Se - #r Ser Ser Thr Ala Tyr    #80    - Met Gln Leu Ser Ser Leu Thr Ser Glu Asp Se - #r Ala Val Tyr Tyr Cys    #                95    - Ser Arg Gly Asp Gly Ser Asp Tyr Tyr Ala Me - #t Asp Tyr Trp Gly Gln    #           110    - Gly Thr Thr Val Thr Val Ser Ser Gly Gly Gl - #y Gly Ser Asp Ile Glu    #       125    - Leu Thr Gln Ser Pro Ala Ile Leu Ser Ala Se - #r Pro Gly Gly Lys Val    #   140    - Thr Met Thr Cys Arg Ala Ser Ser Ser Val Se - #r Tyr Met His Trp Tyr    145                 1 - #50                 1 - #55                 1 -    #60    - Gln Gln Lys Pro Gly Ser Ser Pro Lys Pro Tr - #p Ile Tyr Ala Thr Ser    #               175    - Asn Leu Ala Ser Gly Val Pro Thr Arg Phe Se - #r Gly Thr Gly Ser Gly    #           190    - Thr Ser Tyr Ser Leu Thr Ile Ser Arg Val Gl - #u Ala Glu Asp Ala Ala    #       205    - Thr Tyr Tyr Cys Gln Gln Trp Ser Arg Asn Pr - #o Phe Thr Phe Gly Ser    #   220    - Gly Thr Lys Leu Glu Ile Lys Arg Ala Ala Al - #a Glu Gln Lys Leu Ile    225                 2 - #30                 2 - #35                 2 -    #40    - Ser Glu Glu Asp Leu Asn                    245    - (2) INFORMATION FOR SEQ ID NO: 58:    -      (i) SEQUENCE CHARACTERISTICS:    #acids    (A) LENGTH: 101 amino              (B) TYPE: amino acid              (C) STRANDEDNESS:              (D) TOPOLOGY: linear    -     (ii) MOLECULE TYPE: protein    #58:  (xi) SEQUENCE DESCRIPTION: SEQ ID NO:    - Met Ser Asn Thr Gln Ala Glu Arg Ser Ile Il - #e Gly Met Ile Asp Met    #                15    - Phe His Lys Tyr Thr Arg Arg Asp Asp Lys Il - #e Asp Lys Pro Ser Leu    #            30    - Leu Thr Met Met Lys Glu Asn Phe Pro Asn Ph - #e Leu Ser Ala Cys Asp    #        45    - Lys Lys Gly Thr Asn Tyr Leu Ala Asp Val Ph - #e Glu Lys Lys Asp Lys    #    60    - Asn Glu Asp Lys Lys Ile Asp Phe Ser Glu Ph - #e Leu Ser Leu Leu Gly    #80    - Asp Ile Ala Thr Asp Tyr His Lys Gln Ser Hi - #s Gly Ala Ala Pro Cys    #                95    - Ser Gly Gly Ser Gln                100    __________________________________________________________________________

What is claimed is:
 1. A method for generating a processed ensemble ofpolypeptide molecules, in which processed ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in one particular folded conformation, from an initialensemble of polypeptide molecules which have the same amino acidsequence as the processed ensemble of polypeptide molecules, in whichinitial ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in unfolded or misfoldedconformations, the method comprising subjecting the initial ensemble ofpolypeptide molecules to a series of at least five successive cycles,each of which comprises a sequence of1) at least one denaturing stepcomprising conditions exerting a denaturing or unfolding influence onthe polypeptide molecule of the ensemble so as to denature a fraction ofthe polypeptides in the ensemble, followed by 2) at least one renaturingstep comprising conditions having a renaturing influence on thepolypeptide molecules having conformations resulting from the precedingstep so as to refold a fraction of the denatured or unfoldedpolypeptides in the ensemble, wherein:A) the series of at least fivesuccessive cycles results in the processed ensemble of polypeptidemolecules having a high fraction of polypeptide molecules in theparticular folded conformation thana) the initial ensemble, and b) thecorresponding processed ensemble which has been subjected to one of thecycles only and B) the polypeptide molecules are molecules which have anamino acid sequence identical to that of an authentic polypeptide, orare molecules which comprise an amino acid sequence corresponding tothat of an authentic polypeptide joined to one or two additionalpolypeptide segments via a cleavable junction or similar or dissimilarcleavable junctions.
 2. A method for generating a processed ensemble ofpolypeptide molecules, in which processed ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in one particular folded conformation, from an initialensemble of polypeptide molecules which have the same amino acidsequence as the processed ensemble of polypeptide molecules, in whichinitial ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in unfolded or misfoldedconformations, the method comprising subjecting the initial ensemble ofpolypeptide molecules to a series of at least five successive cycles,each of which comprises a sequence of1) at least one denaturing stepcomprising conditions exerting a denaturing or unfolding influence onthe polypeptide molecules of the ensemble so as to denature a fractionof the polypeptides in the ensemble, followed by 2) at least onerenaturing step comprising conditions having a renaturing influence onthe polypeptide molecules having conformations resulting from thepreceding step so as to refold a fraction of the denatured or unfoldedpolypeptides in the ensemble, wherein the polypeptide molecules arenon-covalently adsorbed to a solid or semisolid carrier through a moietyhaving affinity to a component of the carrier, said polypeptidemolecules being substantially confined to said carrier and in contactwith an aqueous or an organic liquid phase during the denaturing stepand the renaturing step, wherein the liquid can be changed or exchangedsubstantially without entraining the polypeptide molecules, wherein theseries of at least five successive cycles results in the processedensemble of polypeptide molecules having a higher fraction ofpolypeptide molecules in the particular folded conformation thana) theinitial ensemble, and b) the corresponding processed ensemble which hasbeen subjected to one of the cycles only.
 3. A method according to claim1, wherein the moiety has an amino acid sequence identical to SEQ ID NO:47, the carrier comprising aa Nitrilotriacetic Acid derivative (NTA)charged with Ni⁺⁺ ions.
 4. A method for generating a processed ensembleof polypeptide molecules, in which processed ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in one particular folded conformation, from an initialensemble of polypeptide molecules which have the same amino acidsequence as the processed ensemble of polypeptide molecules, in whichinitial ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in unfolded or misfoldedconformations, the method comprising subjecting the initial ensemble ofpolypeptide molecules to a series of at least five successive cycles,each of which comprises a sequence of1) at least one denaturing stepcomprising conditions exerting a denaturing or unfolding influence onthe polypeptide molecule of the ensemble so as to denature a fraction ofthe polypeptides in the ensemble, followed by 2) at least one renaturingstep comprising conditions having a renaturing influence on thepolypeptide molecules having conformations resulting from the precedingstep so as to refold a fraction of the denatured or unfoldedpolypeptides in the ensemble, wherein:A) the series of at least fivesuccessive cycles results in the processed ensemble of polypeptidemolecules having a high fraction of polypeptide molecules in theparticular folded conformation thana) the initial ensemble, and b) acorresponding processed ensemble which has been subjected to one of thecycles only and B) the polypeptide molecules comprise a polypeptidesegment which is a substrate for preferential cleavage by a cleavingagent at a specific peptide bond, and said segment is a sequence whichis selectively recognized by the bovine coagulation factor X_(a).
 5. Amethod for generating a processed ensemble of polypeptide molecules inwhich processed ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in one particular foldedconformation, from an initial ensemble of polypeptide molecules whichhave the same amino acid sequence as the processed ensemble ofpolypeptide molecules, in which initial ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in unfolded or misfolded conformations, the method comprisingsubjecting the initial ensemble of polypeptide molecules to a series ofat least five successive cycles, each of which comprises a sequence of1)at least one denaturing step comprising conditions exerting a denaturingor unfolding influence on the polypeptide molecule of the ensemble so asto denature a fraction of the polypeptides in the ensemble, followed by2) at least one renaturing step comprising conditions having arenaturing influence on the polypeptide molecules having conformationsresulting from the preceding step so as to refold a fraction of thedenatured or unfolded polypeptides in the ensemble, wherein:A) theseries of at least five successive cycles results in the processedensemble of polypeptide molecules having a high fraction of polypeptidemolecules in the particular folded conformation thana) the initialensemble, and b) a corresponding processed ensemble which has beensubjected to one of the cycles only and B) the polypeptide moleculescomprise a polypeptide segment which is in vitro-convertible into aderivatized polypeptide segment which is a substrate for preferentialcleavage by a cleaving agent at a specific peptide bond, the polypeptidesegment being selectively recognized by the bovine coagulation factorX_(a), and which polypeptide segment has an amino acid sequence selectedfrom the group consisting of SEQ ID NO: 43, SEQ ID NO: 44, SEQ ID NO:45, and SEQ ID NO:
 46. 6. A method according to claim 5 wherein thepolypeptide molecules comprise a polypeptide segment with eithera) theamino acid sequence SEQ ID NO: 43 or SEQ ID NO: 44, which is convertedinto a derivatized polypeptide, which is selectively recognized bybovine coagulation factor X_(a), by reacting the cysteine residue withN-(2-mercaptoethyl)morpholyl-2-thiopyridyl disulphide ormercaptothioacetate-2-thiopyridyl disulphide, or b) with the amino acidsequence SEQ ID NO: 45 or SEQ ID NO: 46, which is converted into aderivatized polypeptide, which is selectively recognized by bovinecoagulation factor X_(a) by oxidation of the thioether moiety in themethionine side group to a sulphoxide or sulphone derivative.
 7. Amethod according to claim 4, wherein the polypeptide segment selectedfrom the group consisting of SEQ ID NO: 38, SEQ ID NO: 40, SEQ ID NO: 41and SEQ ID NO: 42 is linked N-terminally to the authentic polypeptide.8. A method according to claim 4, wherein the polypeptide segmentselected from the group consisting of SEQ ID NO: 43, SEQ ID NO: 44, SEQID NO: 45 and SEQ ID NO: 46 is linked N-terminally to the authenticpolypeptide.
 9. A method for generating a processed ensemble ofpolypeptide molecules in which processed ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in one particular folded conformation, from an initialensemble of polypeptide molecules which have the same amino acidsequence as the processed ensemble of polypeptide molecules, in whichinitial ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in unfolded or misfoldedconformations, the method comprising subjecting the initial ensemble ofpolypeptide molecules to a series of at least five successive cycles,each of which comprises a sequence of1) at least one denaturing stepcomprising conditions exerting a denaturing or unfolding influence onthe polypeptide molecule of the ensemble so as to denature a fraction ofthe polypeptides in the ensemble, followed by 2) at least one renaturingstep comprising conditions having a renaturing influence on thepolypeptide molecules having conformations resulting from the precedingstep so as to refold a fraction of the denatured or unfoldedpolypeptides in the ensemble, wherein:A) the series of at least fivesuccessive cycles results in the processed ensemble of polypeptidemolecules having a high fraction of polypeptide molecules in theparticular folded conformation thana) the initial ensemble, and b) acorresponding initial ensemble which has been subjected to one of thecycles only; B) the duration of each denaturing step is at least 1millisecond and at most 1 hour, and the duration of each renaturing stepis at least 1 second and at most 12 hours; C) the denaturing conditionsof each individual denaturing step are kept substantially constant for aperiod of time, and the renaturing conditions of each individualrenaturing step are kept substantially constant for a period of time,the periods of time during which conditions are kept substantiallyconstant being separated by transition periods during which theconditions are changed, the change of conditions being accomplished bychanging the chemical composition of a liquid phase with which thepolypeptide molecules are in contact; and D) the denaturing of thepolypeptide molecules is achieved or enhanced by decreasing orincreasing pH of the liquid phase.
 10. A method for generating aprocessed ensemble of polypeptide molecules in which processed ensemblethe conformational states represented contain a substantial fraction ofpolypeptide molecules in one particular folded conformation, from aninitial ensemble of polypeptide molecules which have the same amino acidsequence as the processed ensemble of polypeptide molecules, in whichinitial ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in unfolded or misfoldedconformations, the method comprising subjecting the initial ensemble ofpolypeptide molecules to a series of at least five successive cycles,each of which comprises a sequence of1) at least one denaturing stepcomprising conditions exerting a denaturing or unfolding influence onthe polypeptide molecule of the ensemble so as to denature a fraction ofthe polypeptides in the ensemble, followed by 2) at least one renaturingstep comprising conditions having a renaturing influence on thepolypeptide molecules having conformations resulting from the precedingstep so as to refold a fraction of the denatured or unfoldedpolypeptides in the ensemble, wherein:A) the series of at least fivesuccessive cycles results in the processed ensemble of polypeptidemolecules having a high fraction of polypeptide molecules in theparticular folded conformation thana) the initial ensemble, and b) acorresponding initial ensemble which has been subjected to one of thecycles only; B) the substantial fraction of polypeptide molecules in oneparticular folded conformation constitutes at least 5% (W/W) of theinitial ensemble of polypeptide molecules; C) the polypeptide moleculesof the processed ensemble comprise cysteine-containing molecules, andthe processed ensemble comprises a substantial fraction of polypeptidemolecules in one particular uniform conformation which, in addition,have substantially identical disulphide bridging topology; D) thepolypeptide molecules are in contact with an aqueous or organic liquidphase during the denaturing step and the renaturing step, and whereinthe liquid phase used in at least one of the denaturing steps and/or inat least one of the renaturing steps contains at least onedisulphide-reshuffling system, X; E) the disulphide-reshuffling systemcontains glutathione, 2-mercaptoethanol or thiocholine, each of which isin admixture with its corresponding symmetrical disulphide; and F) theconversion of the cysteine residues to mixed disulphide products isaccomplished by reacting the fully denatured and fully reduced ensembleof polypeptide molecules with an excess of a reagent which is ahigh-energy mixed disulphide compound.
 11. A method according to claim10, wherein the high-energy mixed disulphide compound isaliphatic-aromatic.
 12. A method according to claim 10, wherein thehigh-energy mixed, disulphide compound has the general formula: ##STR2##wherein R₁ is a pyridyl, R₂, R₃ and R₄ are hydrogen or an optionallysubstituted lower aromatic or aliphatic hydrocarbon group.
 13. A methodaccording to claim 11, wherein the high-energy mixed disulphide compoundis selected from the group consisting of glutathionyl-2-thiopyridyldisulphide, 2-thiocholyl-2-thiopyridyl disulphide,2-mercaptoethanol-2-thiopyridyl disulphide andmercaptoacetate-2-thiopyridyl disulphide.
 14. A method according toclaim 10, wherein1) the chemical changes in the liquid phase areaccomplished by changing between a denaturing solution B and arenaturing solution A and 2) the concentration of one or more denaturingcompounds in B is decremented after each cycle.
 15. A method forgenerating a processed ensemble of polypeptide molecules in whichprocessed ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in one particular foldedconformation, from an initial ensemble of polypeptide molecules whichhave the same amino acid sequence as the processed ensemble ofpolypeptide molecules, in which initial ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in unfolded or misfolded conformations, the method comprisingsubjecting the initial ensemble of polypeptide molecules to a series ofat least five successive cycles, each of which comprises a sequence of1)at least one denaturing step comprising conditions exerting a denaturingor unfolding influence on the polypeptide molecule of the ensemble inthe presence of a denaturing solution B so as to denature a fraction ofthe polypeptides in the ensemble, followed by 2) at least one renaturingstep comprising conditions having a renaturing influence on thepolypeptide molecules having conformations resulting from the precedingstep so as to refold a fraction of the denatured or unfoldedpolypeptides in the ensemble, wherein:A) the series of at least fivesuccessive cycles results in the processed ensemble of polypeptidemolecules having a high fraction of polypeptide molecules in theparticular folded conformation thana) the initial ensemble, and b) acorresponding initial ensemble which has been subjected to one of thecycles only; B) the denaturing and renaturing of the polypeptidemolecules is accomplished by direct changes in physical parameters towhich the polypeptide molecules are exposed or by changes in physicalparameters which enhance or moderate the denaturing and renaturingconditions; and C) the concentration of one or more denaturing compoundsin denaturing solution B is kept constant in each cycle.
 16. A methodfor generating a processed ensemble of polypeptide molecules in whichprocessed ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in one particular foldedconformation, from an initial ensemble of polypeptide molecules whichhave the same amino acid sequence as the processed ensemble ofpolypeptide molecules, in which initial ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in unfolded or misfolded conformations, the method comprisingsubjecting the initial ensemble of polypeptide molecules to a series ofat least five successive cycles, each of which comprises a sequence of1)at least one denaturing step comprising conditions exerting a denaturingor unfolding influence on the polypeptide molecules of the ensemble soas to denature a fraction of the polypeptides in the ensemble, followedby 2) at least one renaturing step comprising conditions having arenaturing influence on the polypeptide molecules having conformationsresulting from the preceding step so as to refold a fraction of thedenatured or unfolded polypeptides in the ensemble, wherein:A) theseries of at least five successive cycles results in the processedensemble of polypeptide molecules having a high fraction of polypeptidemolecules in the particular folded conformation thana) the initialensemble, and b) a corresponding processed ensemble which has beensubjected to one of the cycles only and B) the series comprises at least10 and at most 1000 cycles.
 17. A method according to claim 16, whereinthe series comprises at least 25 cycles and at most 500 cycles.
 18. Amethod according to claim 17, wherein the series comprises at most 200cycles.
 19. A method according to claim 18, wherein the series comprisesat most 100 cycles.
 20. A method according to claim 19, wherein theseries comprises at most 50 cycles.
 21. A method according to claim 2,wherein the moiety having affinity to a component of the carrier is abiotin group or an analogue thereof bound to an amino acid moiety of thepolypeptide, the carrier having avidin, streptavidin, or analoguesthereof attached thereto.
 22. A method according to claim 4, wherein thepolypeptide segment has an amino acid sequence selected from the groupconsisting of SEQ ID NO: 38, SEQ ID NO: 40, SEQ ID NO: 41 and SEQ ID NO:42.
 23. A method according to claim 8, whereinA) denaturing of thepolypeptide molecules is accomplished by contacting the polypeptidemolecules with a liquid phase in which at least one denaturing compoundis dissolved, and wherein renaturing of the polypeptide molecules isaccomplished by contacting the polypeptide molecules with a liquid phasewhich either contains at least one dissolved denaturing compound in sucha concentration that the contact with the liquid phase will tend torenature rather than denature the ensemble of polypeptide molecules intheir respective conformational states resulting from the precedingstep, or contains no denaturing compound and B) the denaturing compoundis selected from the group consisting of urea, guanidine-HCl,dimethylformamide and di-C₁₋₆ -alkylsulphone.
 24. A method forgenerating a processed ensemble of polypeptide molecules, in whichprocessed ensemble the conformational states represented contain asubstantial fraction of polypeptide molecules in one particular foldedconformation, from an initial ensemble of polypeptide molecules whichhave the same amino acid sequence as the processed ensemble ofpolypeptide molecules, in which initial ensemble the conformationalstates represented contain a substantial fraction of polypeptidemolecules in unfolded or misfolded conformations, the method comprisingsubjecting the initial ensemble of polypeptide molecules to a series ofat least five successive cycles, each of which comprises a sequence of1)at least one denaturing step comprising conditions exerting a denaturingor unfolding influence on the polypeptide molecules of the ensemble soas to denature a fraction of the polypeptides in the ensemble, followedby 2) at least one renaturing step comprising conditions having arenaturing influence on the polypeptide molecules having conformationsresulting from the preceding step so as to refold a fraction of thedenatured or unfolded polypeptides in the ensemble, wherein:A) theseries of at least five successive cycles results in the processedensemble of polypeptide molecules having a higher fraction ofpolypeptide molecules in the particular folded conformation thana) theinitial ensemble, and b) a corresponding initial ensemble which has beensubjected to one of the cycles only; B) the polypeptide molecules are incontact with an aqueous or an organic liquid phase during the denaturingstep and the renaturing step; and C) the polarity of the liquid phaseused in the renaturing of the polypeptide molecules has been modified bythe addition of a salt, a polymer, trifluoroethanol or a combinationthereof.